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PREFACE 


This book presents the theory of a recently developed method of 
statistical inference, that of sequential analysis. An effort has been 
made to keep the exposition on a level that will make most of the book, 
with the exception of the Appendix, understandable to readers whose 
mathematical background does not go beyond college algebra and a 
first course in calculus. Some knowledge of probability and statistics 
is desirable for the understanding of the book, although not essential, 
for a brief review is given of the fundamental concepts, such as random 
variables, probability distributions, and statistical hypotheses. 

To facilitate the reading of the book for those who have no advanced 
mathematical training, some concessions are made to generality and 
occasionally even to rigor. Furthermore, mathematical derivations 
of somewhat intricate nature are put into the Appendix, the reading 
of which may be omitted without impairing the understanding of the 
rest of the book. 

This book contains an expanded exposition of the ideas and results 
I published in two technical papers on this subject, one of which 
appeared in 1944 and the other in 1945, as well as some further develop- 
ments. Such developments, for ejgjmple, are: the discussion of multi- 
valued decisions and estimatioi ^ ^ fc ^art III; improvements in the 
limits for the average number of observations required by a sequential 
test; and limits for the effect of grouping in the binomial case. Some 
recent results of M. A. Girshick are included and* in the discussion of 
certain applications in Part II, use is made of some simplifications con- 
tained in a publication of the Statistical Research Group of Columbia 
University dealing with these applications. 

Nearly all tables in the book were computed by the Statistical 
Research Group of Columbia University while I was a consultant to 
the group. A few sections of my two forementioned publications have 
been incorporated in this book, mostly in the Appendix, without sub- 
stantial changes. 

I wish to express my indebtedness to Milton Friedman and W. Allen 
Wallis, who proposed the problem of sequential analysis to me in 
March, 1943. It was their clear formulation of the problem that gave 
me the incentive to start the investigations leading to the present 
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developments. I also wish to express my thanks to the Social Science 
Research Council for their help, which facilitated the publication of 
this book. I am indebted to Mr. Mortimer Spiegelman of the Metro- 
politan Life Insurance Company for his careful reading of the manu- 
script and for making several valuable suggestions. Thanks are due 
also to Mrs. E. Bowker who prepared the manuscript with particular 
care. 

A. W. 

Columbia University 
March, 1947 
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INTRODUCTION 


Sequential analysis is a method of statistical inference whose charac- 
teristic feature is that the number of observations required by the 
procedure is not determined in advance of the experiment. The deci- 
sion to terminate the experiment depends, at each stage, on the results 
of the observations previously made. A merit of the sequential method, 
as applied to testing statistical hypotheses, is that test procedures can 
be constructed which require, on the average, a substantially smaller 
number of observations than equally reliable test procedures based on 
a predetermined number of observations. 

This book presents the theory of a particular method of sequential 
analysis, the so-called sequential probability ratio test, which was de- 
vised by the author in 1943 mainly for the purpose of testing statistical 
hypotheses. A comparison of this particular sequential test procedure 
with any other (sequential or non-sequential) is shown, in Section A. 7, 
to effect the greatest possible saving in the- average number of observa- 
tions, when used for testing a simple^ hypothesis against a single alter- 
native. The sequential probability ratio test frequently results in a 
saving of about 50 per cent in the number of observations over the 
most efficient test procedure based on a fixed number of observations. 

Thefirst i dea of a sequential t est proce dure, i.e., a test for which th e 
numb er of ohserm tio ns is not"determmedTn advance but is depende nt 
on the outcome of the observations as they are mad e, goes back to 
H. F. Dodge and H. G. Romig 1 who constructed a double sampling 
procedure. According to this scheme the decision whether or not a 
second sample should be drawn depends on the outcome of the obser- 
vations in the first sample. Whereas this method allows for only two 
samples, Walter Bartky devised a multiple sampling scheme for the 
particular case of testing the mean of a binomial distribution. 2 His 
scheme is closely related to the test procedure that results from the 
application of the sequential probability ratio test to this particular 
case. The reason that Dodge and Romig introduced their double 

1 H. F. Dodge and H. G. Romig, “ A Method of Sampling Inspection,” The 
Bell System Technical Journal , Vol. 8 (1929), pp. 618-631. 

* Walter Bartky, “Multiple Sampling with Constant Probability,” The Annals 
of Mathematical Statistics , Vol. 14 (1943), pp. 363-377. 
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sampling method, and Bartky his multiple sampling scheme was, of 
course, the recognition of the fact that they require, on the average, 
a smaller number of observations than “single” sampling. 

The occasional practice of designing a large scale experiment in suc- 
cessive stages may be regarded as a forerunner of sequential analysis. 
The idea of such chain experiments was briefly discussed by Harold 
Hotelling . 8 A very interesting example of this type is the series of 
sample censuses of area of jute in Bengal carried out under the direc- 
tion of P. C. Mahalanobis . 4 Sample censuses, steadily increasing in 
size, were taken primarily for the purpose of obtaining preliminary in- 
formation about the parameters to be estimated. This information 
was then used for designing the final sampling of the whole immense 
jute area in Bengal. 

The problem of sequential analysis arose in the Statistical Research 
Group of Columbia University 8 in connection with some comments 
made by Captain G. L. Schuyler of the Bureau of Ordnance, Navy 
Department. Milton Friedman and W. Allen Wallis recognized the 
great potentialities and the far-reaching consequences that sequential 
analysis might have for the further development of theoretical sta- 
tistics. In particular, they conjectured that a sequential test proce- 
dure might be constructed which would control the possible errors 
committed by wrong decisions exactly to the same extent as the best 
current procedure based on a predetermined number of observations, 
and at the same time would require, on the average, a substantially 
smaller number of observations than the fixed number of observations 
needed for the current procedure . 6 Friedman and Wallis also exhib- 
ited a few examples of sequential modifications of current test pro- 
cedures resulting, in some cases, in an increase of efficiency. It was 
at this stage that they proposed the problem of sequential analysis to 
the author. This gave the incentive for the author's investigations 
which then led to the development of the sequential probability ratio 
test. 

8 Harold Hotelling, “Experimental Determination of the Maximum of a Func- 
tion,” The Annals of Mathematical Statistic8 } Vol. 12 (1941), pp. 20-45. 

4 P. C. Mahalanobis, “A Sample Survey of the Acreage under Jute in Bengal, 
with Discussion on Planning of Experiments,” Proceedings of the 2nd Indian Star 
tistical Conference f Calcutta, Statistical Publishing Society (1940). 

8 During World War II the Statistical Research Group operated under a con- 
tract with the Office of Scientific Research and Development and was directed 
by the Applied Mathematics Panel of the National Defense Research Committee. 

• Bartky ’s multiple sampling scheme for testing the mean of a binomial distribu- 
tion provides an example of such a sequential test. His results were not known to 
Friedman and Wallis at that time, since they were published nearly a year later. 
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Because of the usefulness of the sequential probability ratio test in 
development work on military and naval equipment, it was classified 
Restricted within the meaning of the Espionage Act. The author was 
requested to submit his findings in a restricted report 7 dated Sep- 
tember, 1943. 8 In this report the sequential probability ratio /test is 
devised and the basic theory is given. To facilitate the use' of this 
pew technique by the Army and the Navy, the Statistical Research 
Group issued a second report in July, 1944, which gives an elementary 
non-mathematical exposition of the applications of the sequential prob- 
ability ratio test and contains a considerable number of tables, charts, 
and computational simplifications to facilitate applications. 9 

Further advances in the theory of the sequential probability ratio 
test were made in 1944. The operating characteristic (OC) curve of 
the sequential probability ratio test for the case of a binomial distri- 
bution was found by Milton Friedman and George W. Brown (inde- 
pendently of each other), and slightly earlier by C. M. Stockman in 
England. 10 The author then obtained the general OC curve for any 
sequential probability ratio test. 11 A few months later a general 
theory of cumulative sums was developed 12 which gives not only the 
OC curve of any sequential probability ratio test but also the charac- 
teristic function of the number of observations required by the test 
and various other results. 

The material in the author’s report together with the new advances 
made in 1944 were published by him in a paper, “Sequential Tests of 
Statistical Hypotheses,” in The Annals of Mathematical Statistics, June, 
1945. The Statistical Research Group issued a revised edition 18 of its 

7 Abraham Wald, “Sequential Analysis of Statistical Data: Theory,” a report 
submitted by the Statistical Research Group, Columbia University, to the Applied 
Mathematics Panel, National Defense Research Committee, Sept., 1943. 

8 The restricted classification was removed in May, 1945. 

9 Harold Freeman, “Sequential Analysis of Statistical Data: Applications,” a 
report submitted by the Statistical Research Group, Columbia University, to the 
Applied Mathematics Panel, National Defense Research Committee, July, 1944. 

10 C. M. Stockman, “A Method of Obtaining an Approximation for the Operating 
Characteristic of a Wald Sequential Probability Ratio Test Applied to a Binomial 
Distribution,” (British) Ministry of Supply, Advisory Service on Statistical 
Method and Quality Control, Technical Report, Series “R,” No. Q.C./R/19. 

11 Abraham Wald, “A General Method of Deriving the Operating Characteristics 
of any Sequential Probability Ratio Test,” unpublished memorandum submitted 
to the Statistical Research Group, Columbia University, April, 1944. 

n Abraham Wald, “On Cumulative Sums of Random Variables,” The Annals 
of Mathematical Statistics , Vol. 15 (Sept., 1944). 

18 The authorship of the revised edition, which was published by the Columbia 
University Press, Sept., 1945, is ascribed to the group as a whole. 
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original report. The revised edition includes a discussion of the oper- 
ating characteristic and average sample number curves for various 
applications of the sequential probability ratio test. 

Independently of the development in this country and about the 
same time, G. A. Barnard recognized the merits of a sequential method 
of testing . 14 He treated the problem of double dichotomies, using a 
sequential method of testing which, however, differs from the one that 
results from the application of the sequential probability ratio test. 

This book consists of three parts and an Appendix. Part I contains 
a discussion of the general theory of the sequential probability ratio 
test. Part discusses applications of the general theory given in 
Part I. Tfiese applications are given primarily to illustrate the gen- 
eral theory and to bring out some points of theoretical interest which 
are specific to these applications. Accordingly, computational simpli- 
fications are not stressed much and hardly any tables are given . 15 
Part III outlines briefly a possible approach to the problem of sequen- 
tial multi-valued decisions and estimation. This field is largely un- 
explored and further progress is still a matter of future developments. 
To facilitate the use of the book by readers with no advanced mathe- 
matical training, mathematical derivations of somewhat intricate na- 
ture are included in the Appendix. 

14 G. A. Barnard, “Economy in Sampling with Reference to Engineering Experi- 
mentation,” (British) Ministry of Supply, Advisory Service on Statistical Method 
and Quality Control, Technical Report, Series “R,” No. Q.C./R/7. 

18 For a more complete and detailed discussion of these applications the reader 
is referred to the revised edition of the publication of the Statistical Research 
Group mentioned before. 
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Chapter 1. ELEMENTS OF THE CURRENT THEORY OF 
TESTING STATISTICAL HYPOTHESES 

1.1 Random Variables and Probability Distributions 

1.1.1 Notion of a Random Variable 

The outcome of an experiment or the reading of a measurement is 
usually a variable quantity or, more briefly, a variable, since generally 
it can take different values. For example, repeated measurements on 
the length of a bar will yield, in general, different values. Frequently, 
it will be possible to make probability statements concerning the out- 
come of an experiment or the reading of a measurement. Consider, 
for example, the experiment consisting of the throw of a die whose sides 
are numbered from 1 to 6. Here the outcome of the experiment may be 
any integral number from 1 to 6. Various probability statements regard- 
ing the outcome of the experiment can be made. For example, the prob- 
ability that the outcome will be equal to 5 is equal to %, or the prob- 
ability that the outcome will be less than 4 is equal to 3^, and so forth. 
Probability statements can also be made about the outcome of the 
following experiment: Suppose that an individual is selected at random 
from a group of 1000 individuals and that his height is then measured. 
The probability that the height of the selected individual will be less 
than 68 inches is equal to Kooo times the number of individuals in the 
group whose heights are less than 68 inches. 

A variable x is called a random variable if for any given value c a 
definite probability can be ascribed fo the event that x will take a value 
less than c. A general class of experiments where the outcome is a 
random variable in the sense of the above definition may be described 
as follows. Consider a class of N objects (or individuals) and some 
measurable characteristic of these objects, such as weight, diameter, or 
hardness. Suppose that the value x of this characteristic varies from 
object to object in the class. The experiment consists in selecting at 
random one object from the class of N objects, and then measuring 
the value x of the characteristic of the selected object. Random selec- 
tion is selection of an object in such a way that each object in the 
class of N objects has an equal chance of being chosen. The outcome 

5 
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x of such an experiment is a random variable, since a probability can 
be ascribed to the event that x will take a value less than c, for any 
given value c. This probability is, in fact, equal to N c /N, where N c 
is the number of objects in the class for which the characteristic under 
consideration has a value less than c . An interesting special case is 
that in which the characteristic under consideration can take only two 
values. Such a situation arises, for instance, in the case of a manufac- 
tured product where each unit is classified in one of two categories: 
defective or non-defective. We shall ascribe the value 0 to a non- 
defective unit and the value 1 to a defective unit. Then the charac- 
teristic under consideration, i.e., the characteristic of being defective 
or non-defective, can take only the values 0 and 1. Consider a lot 
consisting of N units and let N a be the number of defectives in the lot. 
If the experiment consists in inspecting a single unit drawn at random 
from the lot, the outcome x of the experiment is a random variable 
which can take only the values 0 and 1. The probability that x = 0 
is equal to (. N — N a) /N, and the probability that x — 1 is equal to 
Nd/N. 

1.1.2 Cumulative Distribution Function (c.d.f.) of a Random Vari- 
able 

Let # be a random variable and denote by F(t) the probability that 
x will take a value less than a given value t. Then F(i) is a function 
of t which is called the cumulative distribution function of x. Since 



any probability must lie between 0 and 1, we must have 0 ^ F(t) ^ 1 
for all values of t . If t\ and t 2 are two values such that t\ < t 2) then the 
probability that x < ^ is greater than or equal to the probability that 
x < t\ y i.e., F(t 2 ) ^ F (t{). In other words, F(t) cannot decrease as 
t increases. A typical form of a c.d.f. F(t) is shown in Fig. 1 where t 
is measured along the horizontal axis and F(t) along the vertical axis. 
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For any given values a and b (a < b) we can easily derive the value 
of the probability that a S x <b from the c.d.f. F(t). In fact, the 
event that x < a and the event that a S x < b are mutually exclu- 
sive. Hence, the probability that one of these events will occur is 
equal to the sum of the two probabilities: the probability that x < a 
and the probability that a S x <b. Thus, we have 

(1 :1) (probability that either x<a or aSx<b) 

= (probability that x < a) + (probability that a £ x <b) 

Since the probability that either x < a or a S x <b is the same as 
the probability that x < b, we obtain, from (1:1), 

(1:2) F(b ) = F(a) + (probability that a S x < b) 

Hence, the probability that a S x < b is equal to F(b) — F(a ). 

A simple interpretation of the c.d.f. F(t) can be given if the random 
variable x is the value of a measurement on an object selected at ran- 
dom from a given group of N objects. As mentioned in Section 1.1.1, 
in this case the probability that the observed value of x satisfies some 
equality or inequality relationship, such as x = c, or x < c, or a < x 
< b, is equal to the proportion of objects in the group of N objects 
for which the value of x satisfies the equality or inequality in question. 
Thus, F(t) is simply equal to the proportion of objects in the group 
for which x < t. With this interpretation of probability, the validity 
of (1:2) becomes self-evident. It merely says this: The proportion of 
objects for which x < b is equal to the proportion of objects for which 
x < a plus the proportion of objects for which a S x < b. The group 
of N objects is frequently called population or universe . So far we have 
considered only populations which contain a finite number of objects. 
Such populations are called finite populations. 

The interpretation of the probability that a certain relation (equality 
or inequality) holds as the proportion of objects in the population for 
which the value of x satisfies that relation proves useful in many 
instances and we shall employ it frequently. However, if we restrict 
ourselves to finite populations, such an interpretation is not always 
possible. In fact, the c.d.f.’s which arise from finite populations are 
of a special nature. Suppose that N is the number of objects in the 
population. Then t)ie random variable x can take at most N different 
values. Let a\, * • om be the different values x can take, arranged in 
ascending order of magnitude, i.e., a\ < 02 < • • • < Qm- Clearly, 
MSN. If the value of 2 is the same for several objects, then M < N. 



8 


CURRENT THEORY OF TESTING HYPOTHESES 


The c.d.f. of x will be a step function of the type shown in Fig. 2. The 
distribution function 1 * makes exactly M jumps and the magnitude of 
each jump is equal to l/N or an integral multiple of l/N. A c.d.f. 
represented by a continuous curve, as shown in Fig. 1, is certainly not 
of this type. Thus, if the c.d.f. is given by a continuous curve, the 
interpretation of probabilities as proportions of a finite population is 
not possible. However, any c.d.f. can be approximated arbitrarily 
closely by a c.d.f. arising from a finite population, if the number N of 
objects in the population is sufficiently large. Thus, any c.d.f. can be 



regarded as a limiting form of a c.d.f. arising from a finite population 
when the number of objects in the population is increased indefinitely. 
This means that if we admit infinite populations 1 (populations with 
infinitely many objects), the interpretation of any probability as a 
certain proportion of an underlying population is always possible. 01 
course, the notion of an infinite population is only an abstraction con- 
structed merely for the purpose of simplifying the theory. To give an 
example of an underlying infinite population, consider a measuremenl 
on the length of a bar, the outcome of which is regarded as a random 
variable x having a c.d.f. F(t). Then the underlying infinite popula- 
tion may be thought of as an infinite sequence of repeated measure- 
ments on the length of the bar, and the actually observed measuremenl 
is considered an element drawn from this population. Sometimes th< 
underlying population is finite, but the number N of objects in th< 

l By an infinite population we mean an ordered infinite sequence of objects 
Oi, Oi, • • • , ad inf. A certain measurable characteristic of these objects is considered 
and the value z of this characteristic is assumed to vary from object to object 
By the proportion of objects in the infinite population for which z satisfies a givei 
relation (equality or inequality) we mean the limiting value of the correspondin, 
proportion in the finite population (Oi, • • *, On) as N increases indefinitely. 
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population is so large that we may find it more convenient to treat 
the problem as if N were infinity, i.e., as if the population were infinite. 
Suppose, for example, that we are interested in the height distribution 
of all male individuals of age 20 and above living in the United S ates. 
The number of such individuals is so large that considerable mathe- 
matical simplification may be achieved by treating the population of 
such individuals as if it were infinite. 


1.1.3 Probability Density Function 

Let F(t) be the c.d.f. of a random variable x. As we have seen in 

A A 

Section 1.1.2, the probability that t — — ^ x < t + ~ (A > 0) is given 

z z 


by F ^ — F (^t — ^ . The limiting value fit) of the ratio 

as A approaches 0, provided that such a lim- 


iting value exists, 2 is called the probability density of the random vari- 
able x at the value x = t. The probability density fit) is a function of 
t and is called the probability density function of the random variable 
x. It follows from the definition of the probability density f(t) that 
for small positive values A the product fit) A is a good approximation 

A 

to the probability that x will lie in the interval t ± -. A probability 

density function does not always exist. If the random variable x is 
discrete, i.e., if x can take only discrete values, the c.d.f. is a step func- 
tion and no probability density function exists. 

The probability that x will take a value within the interval from 
t x to t 2 ( h < t 2 ) can be obtained by integrating the probability density 
function fit) from ti to t 2 ; i.e., the probability in question is given by 



dt 


*The existence of the limiting value 


of 


Fit *4“ A) — F(t) 
A 


is required, where 


A may be positive or negative and may approach 0 in any arbitrary manner. 
The existence of this limiting value implies the existence of the limiting value of 


4 + j)-4-i) 


A 
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One of the most important probability density functions is the so- 
called normal probability density function, which is given by 


(13) 




• £»*-*>* 


where /z and <r are some constant values. If a random variable x has 
a probability density function f(t) given by (1:3), we say that x is 
normally distributed, or x has a normal distribution. The shape of a 
normal curve is shown in Fig. 3, where t is measured along the hori- 
zontal axis and f(t) along the vertical axis. 



1.1.4 Discrete Random Variables 

A random variable x is called discrete if it can take only discrete 
values. Any variable which can take only a finite number of different 
values is, of course, a discrete variable. A variable which can take 
infinitely many values may still be discrete. For example, if the vari- 
able x is restricted to integral values, x is discrete. The c.d.f. of a dis- 
crete random variable is a step function, as shown in Fig. 2. Thus, a 
discrete random variable has no probability density function, but 
admits an elementary probability law fit), where fit) denotes the 
probability that x — L 

In what follows we shall consider only random variables which 
either admit a probability density function or have a discrete distri- 
bution. By the probability distribution, or more briefly distribution, 
fit), of a random variable x, we shall always mean the probability 
density function of x , if a probability density function exists. If re is 
a discrete random variable, f(t) will denote the probability that x = t. 
We shall sometimes refer to the distribution f(t) of x also as the popu- 
lation distribution of x, or the distribution of x in the population. 
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1.1.5 Expected Value and Higher Moments of a Random Variable 

Suppose that x is a random variable which has a discrete distribu- 
tion. Let /(0 denote the distribution of x, i.e., fit) is the probability 
that x = L Then the expected value of x , in symbols E(x) y is de- 
fined by 

(1:4) E{x) = 2j/(<) 

t 

where the summation is to be taken over all possible values t of x . 
Interpreting the probability f(t) as the proportion of objects in the 
population for which x = t, we see from (1 :4) that the expected value 
E(x) of x is the same as the mean value of x in the population. If x 
is a continuous variable which admits a probability density function 
f(t)> then the expected value of x is given by 

(1:5) E(x) = C 

J — 00 

The expected value of x is often called also the population mean, or 
mean of x . 

A function <t>(x) of a random variable x is itself a random variable. 
For any positive integer r and for any constant c, the expected value 
0 f _ c y i s called the rth population moment of x referred to the 
value c. Of special interest is the case in which c = E(x). The ex- 
pected value of [x — E(x)] r is called the rth moment of x referred to 
the mean. The second moment referred to the mean, i.e., the expected 
value of [x — E(x)] 2 , is also called the variance of x. The square root 
of the variance is called the standard deviation. 

Consider the normal probability density function 

(1:6) /(0 = vt 6 

where /x and <r are constants (a > 0). Let x be a random variable 
whose distribution is given by (1:6). That the expected value of x 
is then equal to /z and the variance of x is equal to <r 2 can easily be 
verified. 

1.2 Notion of a Statistical Hypothesis 

1.2.1 Unknown Parameters of a Distribution 

Let a? be a random variable. A statistical problem arises when the 
distribution of x is not known and we want to draw some inference 
concerning the unknown distribution of x on the basis of a limited 
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number of observations on x. Frequently, the distribution of a; is not 
entirely unknown, i.e., some partial knowledge of the distribution of 
x is available a priori. To illustrate this we shall consider the two 
following examples. 


Example 1. Consider a lot consisting of N units of a certain manufactured 
product. Suppose that each unit is classified in one of the two categories, defective 
and non-defective. The value 0 is assigned to each non-defective unit and the 
value 1 to each defective unit. One unit is drawn at random from the lot and is 
inspected. The outcome x of this experiment is a random variable which can take 
only the values 0 and 1. Denote by p the proportion of defectives in the lot. 
Then the probability that x — 1 is equal to p and the probability that x =■ 0 is 
equal to 1 — p. Thus, if the value of p were known, the distribution of x would be 
completely known. Usually p is unknown and we want to make some inference 
regarding the value of p by inspecting a limited number of units drawn from the lot. 
If p is unknown, we have only partial knowledge of the distribution of x\ we know 
merely that x is restricted to the values 0 and 1. In this case p is considered an 
unknown parameter which can have any value between 0 and 1. We shall also 
say that the distribution of x involves an unknown parameter p. Thus in this 
example the distribution of x is known except for the value of an unknown para- 
meter p. 

Example 2. Suppose that the length of a bar is measured with an instrument 
for which the error of measurement is known to be normally distributed. The 
outcome x of such a measurement is then a normally distributed random variable, 
i.e., the distribution of x is given by the normal density function 


V 27T(T 


Usually the mean m and the variance <r 2 of the distribution are unknown. These 
quantities are also called the parameters of the normal distribution. The mean m 
can take any real value and <r 2 can take any positive value. Thus, in this example 
too, the distribution function is known except for the values of the parameters 
n and <r 2 involved in the distribution function. 


A general situation similar to that given in Examples 1 and 2 may 
be described as follows: The functional form of the distribution function 
is known and merely the values of a finite number of parameters involved 
in the distribution function are unknown; i.e., the distribution function 
is known except for the values of a finite number of parameters . In Ex- 
ample 1 the only unknown parameter is the proportion p of defectives 
in the lot. In Example 2 there are two unknown parameters, the mean 
p and the variance <r 2 . 

In wh&t follows we shall assume that the distribution of the random 
variable x is known except for the values of a finite number of param- 
eters. 
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1.2.2 Simple and Composite Hypotheses 

Let 0i, • • *, 0* be the unknown parameters of the distribution of the 
random variable x under consideration. A statement about the values 
of 0i, • • •, 6k is called a simple hypothesis if it determines uniquely the 
values of all Jc parameters. It is called a composite hypothesis if it is 
consistent with more than one value for some parameter. For ex- 
ample, if there are two unknown parameters, 0i and 02 , involved in 
the distribution of x , the hypothesis that 0i = 2 and 0 2 = 4 is a simple 
hypothesis, since it specifies completely the values of the unknown 
parameters. On the other hand, the hypothesis that 0i = 0 2 is com- 
posite. In Example 1 the statement that the unknown proportion p 
of defectives is equal to .2 is a simple hypothesis. On the other hand, 
the statement that p lies between . 1 and .3 is a composite hypothesis. 
In Example 2 the statement that n = 3 would be a composite hypoth- 
esis, since it does not specify the value of the unknown variance <r 2 . 

In general, the parameters 0 lf • • *, 0* will not be subject to any 
a priori restrictions; i.e., they may take any values. However, the 
parameters may in some cases be restricted to certain intervals. For 
instance, if one of the unknown parameters is the standard deviation, 
this parameter is restricted to positive values. In other cases, the 
parameter may be able to take only a finite number of discrete values. 

1.3 Outline of the Current Procedure for Testing Statistical Hypoth- 
eses 

1.3.1 The Sample 

Let a; be a random variable and suppose that we wish to test a 
hypothesis concerning the unknown parameters of the distribution of 
x. The decision to accept or reject the hypothesis in question is always 
made on the basis of a finite number of observations on x . A set of a 
finite number of observations on a: is called a sample. The number of 
observations contained in the sample is called the size of the sample. 

We shall be concerned mostly with the case in which the successive 
observations on x are independent in the probability sense. The suc- 
cessive observations x\, • • *, x n on x are said to be independent in the 
probability sense if the (conditional) probability distribution of the ith 
observation X{ (i = 2, • • *, n), when the values of the preceding obser- 
vations Xij • • • , Xi_i are known, is not affected by these values. This 
condition cannot be strictly fulfilled if the successive observations are 
drawn from a finite population. Consider, for instance, the case dis- 
cussed in Example 1 on page 12. Suppose that two successive units 
are drawn at random from the lot. Denote by Xi the value of x for 
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the first unit and by x 2 the value of x for the second unit. The distri- 
bution of xi is clearly given as follows: the probability that x x = 0 is 
1 — p and the probability that x x = 1 is equal to p. The distribution 
of x 2 , when the value of x x is known, is given as follows: if x x = 0, 
then the probability that x 2 = 1 is equal to pN/(N — 1) and the prob- 
ability that x 2 = 0 is equal to 1 — [pN / (N — 1)]. On the other hand, 
if x\ = 1, the probability that x 2 = 1 is equal to ( pN — 1)/(A — 1) 
and the probability that x 2 = 0 is equal to 1 — [(piV —1 )/(N — 1)]. 
Thus, the probability distribution of x 2 is affected by the outcome of 
Z\. For similar reasons no strict independence can prevail in any other 
case in which the successive observations are drawn from a finite popu- 
lation. However, jf the number of objects in the finite population is 
sufficiently large, the dependence is only slight and can be neglected. 

Let x be a discrete random variable, and denote the distribution of 
x by f(t), i.e., f(t) is the probability that x = t. Let x Xy • • *, x n be a 
set of n independent observations on x. Because of the independence 
of the observations, the probability of obtaining a sample equal to the 
observed one is given by the product 

f(x x )f(x 2 ) • • • f{x n ) 

This product is also called the joint probability distribution of the 
observations x X) • • •, x n . 

If x is a continuous random variable admitting a probability density 
function /(x), then the joint density function of n independent obser- 
vations x X) • * • , x n on x is given by the product 

f(x x )f(x 2 ) • • • f(x n ) 

1,3.2 The General Nature of a Test Procedure 

Denote by n the number of observations on the basis of which the 
acceptance or rejection of the hypothesis in question is to be decided. 
Any possible outcome of n successive observations is a sample of size ft. 
A test procedure leading to the acceptance or rejection of the hypoth- 
esis in question is simply a rule specifying, for each possible sample of 
size ft, whether the hypothesis should be rejected or accepted on the 
basis of that sample. This may also be expressed as follows: A test 
procedure is simply a subdivision of the totality of all possible samples 
of size ft into two mutually exclusive parts, say part 1 and part 2, 
together with the application of the rule that the hypothesis be re- 
jected if the observed sample is contained in part 1 and that the 
hypothesis be accepted if the observed sample is contained in part 2. 
Part 1 is also called the critical region. Since part 2 is the totality of 
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all samples of size n which are not included in part 1, part 2 is uniquely 
determined by part 1. Thus, choosing a test procedure is equivalent 
to determining a critical region* 

As an illustration, we shall discuss a few examples. Suppose that a 
lot consisting of N units of a manufactured product is submitted for 
acceptance inspection. Assume that each unit is classified in one of 
the two categories: defective and non-defective. The proportion p of 
defectives in the lot is assumed to be unknown. Let p 0 be a value 
between 0 and 1 such that we prefer to accept the lot if the proportion 
p of defectives is ^ p 0 and we prefer to reject the lot if p > po • Sup- 
pose that a sample of n units, drawn at random from the lot, is inspected 
and on the basis of this sample a decision is to be made to accept the 
lot or reject it. In other words, on the basis of the inspection of the 
sample of n units a decision is to be made to accept the hypothesis 
p ^ Po or reject it. The critical region generally used in this case is 
defined as follows: The hypothesis that p ^ p 0 is rejected, i.e., the lot 
is rejected, if, and only if, the proportion of defectives in the observed 
sample of n units exceeds a suitably chosen numerical constant c. 

Another example: Suppose that the length of a bar is measured with 
an instrument for which the error of measurement is known to be 
normally distributed with variance equal to unity. Thus, the outcome 
a: of a measurement is a normally distributed random variable with 
mean p equal to the true length of the bar and variance unity. Let 
the hypothesis to be tested be the statement that the true length of 
the bar is equal to a specified value no- This hypothesis is to be tested 
on the basis of a sample consisting of n independent measurements 
x u • • •, x n on the length of the bar. The critical region generally used 
for this purpose is defined as follows: The hypothesis that n = mo is 
rejected if, and only if, the sample observed is such that | x — mo | Sc 
where x denotes the arithmetic mean of the u observations and c is a 
suitably chosen numerical constant. 

There are, in general, infinitely many possibilities for choosing a 
critical region. For instance, in the example just discussed we could 
have used the median, or the geometric mean, or the harmonic mean, 
or some other mean of the observations instead of the arithmetic mean. 
The various critical regions cannot be regarded as equally good and 
the fundamental problem in testing hypotheses is to set up principles 
for the proper choice of the critical region. Such principles have been 
advanced by Jerzy Neyman and Egon S. Pearson. In the next section 
we shall discuss briefly the basic idea of the Neyman-Pearson theory. 8 

* See, for example, J. Neyman and E. S. Pearson, Statistical Research Memoirs , 
University College, London, Vol. I (1936), pp. 1-37. 
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1.3.3 Principles for Choosing a Critical Region 

The principles formulated by Neyman and Pearson for the proper 
choice of a critical region constituted an advance of fundamental im- 
portance in the theory of testing hypotheses. The purpose of this 
section is to indicate briefly the basic idea of the Neyman-Pearson 
theory. 

A simple case of particular theoretical interest arises when only one 
unknown parameter 0 is involved in the distribution of the random 
variable x under consideration, and 0 can take only two values, 0 O and 
$i. The basic id$a of the Neyman-Pearson theory can be indicated 
even in this simple case. Therefore, in the rest of this section, as 
well as in the following section, 1.3.4, we shall restrict ourselves to 
the case of a single parameter 0 which can take only two values, 
$o and 0i. 

For any value 0 of the parameter, let/(x, 0) denote the distribution 
of x. We shall denote f(x, 0 O ) by f 0 (x) and/(z, 0i) by fi(x). Suppose 
that it is desired to test the hypothesis that 0 = 0o. We shall refer to 
this hypothesis as the null hypothesis and denote it by H 0 . The hy- 
pothesis that 0 = 0i will be called the alternative hypothesis and will 
be denoted by H i. Thus, we shall deal with the problem of testing the 
hypothesis H 0 against the alternative hypothesis Hi on the basis of 
a sample of n independent observations x u • • •, x n on x. 

As a basis for choosing among critical regions the following consider- 
ations have been advanced by Neyman and Pearson: In accepting or 
rejecting i/ 0 , we may commit errors of two kinds. We commit an 
error of the first kind if we reject H 0 when it is true; we commit an 
error of the second kind if we accept H 0 when Hi is true. After a 
particular critical region W has been chosen, the probability of com- 
mitting an error of the first kind, as well as the probability of commit- 
ing an error of the second kind, is uniquely determined. The probability 
of committing an error of the first kind is equal to the probability, 
determined on the assumption that H 0 is true, that the observed 
sample will be included in the critical region W. The probability of 
committing an error of the second kind is equal to the probability, de- 
termined on the assumption that H\ is true, that the observed sample 
will fall outside the critical region W. For any given critical region 
W we shall denote the probability of an error of the first kind by a 
and the probability of an error of the second kind by /L. 

The probabilities a and /9 have the following important practical 
interpretation: Suppose we draw a large number of samples of size n. 
Let M be the number of such samples drawn. Suppose that for each 
of these M samples we reject H 0 if the sample is included in W and 
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accept H 0 if the sample lies outside W . In this way we make M state- 
ments of rejection or acceptance. Some of these statements will in 
general be wrong. If H 0 is true and if M is large, the probability is 
nearly 1 (i.e., it is practically certain) that the proportion of wrong 
statements (i.e., the number of wrong statements divided by M) will 
be approximately a. If H\ is true, the probability is nearly 1 that the 
proportion of wrong statements will be approximately p. Thus, we 
can say that in the long run the proportion of wrong statements will 
be a if Hq is true and if i/i is true. 

It is clear that one critical region W is more desirable than another 
if it has smaller values of a and p. Although either a or p can be made 
arbitrarily small by a proper choice of the critical region IF, it is im- 
possible to make both a and p arbitrarily small for a fixed value of n, 
i.e., a fixed sample size. To illustrate this point, consider the follow- 
ing two extreme cases: (1) W is empty, i.e., we always accept Ho, ir- 
respective *of the outcome of the sample. In this case a = 0 and p = L 
(2) W is the totality of all possible samples, i.e., we always reject Ho . 
In this case a = 1 and P = 0. If, for some reason, we decide to con- 
sider only critical regions W for which a has a given fixed value, the 
choice of W is based on the following principle, introduced by Neyman 
and Pearson: Restricting ourselves to regions W for which a has a fixed 
value, we choose tfi ftt f° r which 3 is a minimum. 

The quantity a is called the size of the critical region, and the 
quantity 1 — p, the power of the critical region. A critical region 
which has the highest power in the class of all reg ions of equal size 
ih a most powerful region . Since minimizing P is the same as 
maximizing 1 — p, the Neyman-Pearson principle concerning the 
choice of the critical region W can be formulated as follows: Restrict- 
ing ourselves to regions of a fixed size a , we choose that one which is 
most powerful. 

For a fixed sample size, the probability p is a (single-valued) func- 
tion of a, say p(a), if a most powerful critical region is used. Thus, 
given the number of observations on which the test is based, one of 
the quantities a and P can still be chosen arbitrarily. The Neyman- 
Pearson theory leaves the question of this choice open. It is clear 
that if a is small, 1 8 (a) is in general large, and if a is large, p(a) is in 
general small. The choice of a (or p) will be greatly influenced by the 
relative importance of the errors of the first and second kinds in each 
particular application. Suppose, for example, that the loss caused by 
an error of the first kind is one dollar and the loss caused by an error 
of the second kind is merely one cent. Then a small a and a large p 
will be preferable to a large a and a small p . 


18 


CURRENT THEORY OF TESTING HYPOTHESES 


Neyman and Pearson show that a region consisting of all samples 
(xi, • • *, x n ) which satisfy the inequality 


fl(Xl)fl(x 2 ) ••• fl(Xn) > 

fofrlWoM • * ‘ /oW " 


is a most powerful critical region for testing the hypothesis Ho against 
the alternative hypothesis Hi. The term k on the right-hand side of 
(1:7) is a constant chosen so that the region will have the required 
size a. The reason why the critical region defined by (1:7) is most 
powerful can be indicated as follows: For simplicity suppose that the 
probability distributions under H 0 and Hi are discrete. Thus, 
fi{^i)fi{x 2 ) • • • fi(x n ) (i = 0, 1) denotes the probability of obtaining a 
sample equal to the observed one. The critical region defined by (1 :7) 
can be built up by starting with a sample E 1 = (xi 1 , x 2 \ •••>#n 1 ) 


for which 


fiM 

fo(Xl) • • • /oW 


takes its maximum value. Then a sample 


E 2 = (x 2 , • • • , x n 2 ) is included for which 


flM 


takes its 


foM • • • fo(x n ) 

maximum value in the set of samples which is left after E l has been re- 
moved from the totality of all possible samples. In general, after r sam- 
ples E 1 , • • •, E r have been included in the critical region, a sample E r+l 


is added for which 


/lfa) -’’/l(3n) 

/o(zi) • * • foM 


takes its maximum value in the 


set of samples (xi, • • •, x n ) which are left after E 1 , • • *, E r have been 
removed from the totality of all samples. This construction is con- 
tinued until the size of the region reaches the desired value a. 4 Since 
at any stage of the construction the last sample included in the critical 
region has the largest probability under Hi per unit probability under 
Ho as compared with any other sample not yet included in the region, 
it can be seen that the probability measure of the critical region under 
Hi, i.e., the power of the critical region, is greater than or equal to the 
power of any other region of equal size. 

Let us illustrate the principle for choosing a critical region by appli- 
cation to a simple and familiar case. Let H 0 be the hypothesis that 
x is normally distributed with mean d 0 and variance unity. Let Hi be 
the hypothesis that x is normally distributed with mean 6i and vari- 


4 If x is a discrete variable, it may happen that, at the last stage of the construc- 
tion, at the inclusion of the last sample in the critical region, the size of the region 
increases from a value below a to a value somewhat greater than a. 
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ance unity. Assume Q\ > 6q. For testing H 0 against Hi we shall have 

to determine the ratio • Since 

fobi) * * • fo(x n ) 

n 

1 -H S 

■ ■ ■ fl(Xn) = n e 0,-1 

(2») 5 

and 

n 

1 -M S (**-»»)' 

fo(Xi) ■ ■ ■ fo(x n ) = n e 

(2x)5 

the inequality (1 :7) can be written as 

n 

-H 2 Gza“*0i) 2 

6 a= 1 

(1:8) n ^ k 

-'A 2 (*a-« 0>* 

e a= 1 


Taking the logarithm on both sides of this inequality, we obtain 
iS(x a - 0 O ) 2 - JZfe - 0i) 2 = (0i - 0 o )Sx a + §n(0 o 2 - 0j 2 ) ^ log k 


Hence 

(1:9) 


^ > log k - \n% 2 - 0i 2 ) 

±-J Xa ” 0i-0o 

a ■* 1 


= Jfe' 


(say) 


Inequality (1 :9) can be written as 


( 1 : 10 ) 


S(s« - flp) 
n 



(say) 


Now we shall determine the value of k " such that the critical region 
defined by the inequality (1:10) has the size a = .05. Since under the 
hypothesis H 0 the random variable [2(x a — 6 0 )]/n is normally distrib- 
uted with zero mean and variance 1/n, we see from a table of the 
normal distribution that k" = 1.64/Vn. Thus, the most powerful 
region of size .05 consists of all samples for which the inequality 


„ Sfe -0o) ^ 1.64 Jt ^ ^ Wv 

(1:11) » = 

holds. 

This is a familiar result. Long before Neyman and Pearson devel- 
oped their theory of testing hypotheses, it had been the practice to 
use the critical region (1:11) for testing the hypothesis that 0 = 0 O 
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against alternative values d > 6 0 . A remarkable feature of the region 
given by (1:11) is that it does not depend on the alternative value 0%. 
In the derivation of (1:11) merely the inequality $i > 6 0 was used. 
Hence, the test defined by the region (1:11) is most powerful with 
respect to all alternatives 6 > d 0 , i.e., it is a uniformly most powerful 
test when the alternatives are restricted to values greater than 0 O - 

1.3.4 Number of Observations Necessary if a and 0 Have Pre- 
assigned Values 

In the preceding section we assumed that a and the sample size n 
were given and we were looking for a critical region for which p was 
a minimum. In this section we shall assume that a and p are given 
and our problem is to determine the minimum value of n for which 
the power of the most powerful region of size a is greater than or equal 
to 1 - p. 

Let p n denote the probability of an error of the second kind associ- 
ated with a most powerful critical region of size a when the test is 
based on n observations. It can be shown that p n decreases, or at least 
does not increase, with increasing n. In general, p n will approach 0 
as n increases indefinitely. Denote by n(a, 0) the smallest value of n 
for which p n ^ (3. If we want a test procedure such that the prob- 
ability of an error of the first kind is equal to a and the probability 
of an error of the second kind does not exceed p, then according to the 
current theory we must draw a sample of size n ^ n(a, p). If we use 
a most powerful critical region, we need a sample of size n = n(a, p). 

1.3.5 Testing a Hypothesis Viewed as a Decision between Two 
Courses of Action 

It happens frequently in practice that we have to decide between 
two courses of action, say action 1 and action 2, and the preference 
for one or the other action depends on the value of an unknown param- 
eter 6 of the distribution of a random variable x. Denote by w the 
set of all values of 6 for which action 2 is not preferable to action 1. 
Thus, for any value 6 not contained in « we prefer action 2 to action 1. 
The problem of deciding between these two actions on the basis of a 
sample of n independent observations on x may be formulated as a 
problem of testing the hypothesis H that the true value of 0 is con- 
tained in the set w. If the test procedure leads to the acceptance of 
H we take action 1, and if it leads to the rejection of H we take ac- 
tion 2. 

Consider, for example, the following problem. A lot consisting of a 
large number of units of a manufactured product is submitted for 
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acceptance inspection. Suppose that the proportion p of defectives 
in the lot is unknown. There are two courses of action: acceptance of 
the lot and rejection of the lot. In general, there will exist a partic ular 
value p' of p such that if the true proportion of defectives is < p' we 
prefer acceptance and if p > p' we prefer rejection. If p = p' we are 
indifferent which action is taken. Suppose that a decision is to be 
made on the basis of a sample of n units drawn at random from the 
lot. This problem may be viewed as a problem of testing the hypoth- 
esis H that p ^ p' on the basis of a sample drawn from the lot. The 
lot is accepted or rejected according as H is accepted or rejected. 

As mentioned in Section 1.3.3, the choice of a, i.e., the size of the 
critical region, is greatly influenced by the relative importance we 
attach to errors of the first and second kinds. If the problem of test- 
ing a hypothesis arises out of the problem of deciding between certain 
two courses of action, the relative importance of the errors of the first 
and second kinds may be judged by considering the practical conse- 
quences of taking one action when the value of the parameter is such 
that the other action would have been preferable. 



Chapter 2. SEQUENTIAL TEST OF A STATISTICAL 
HYPOTHESIS: GENERAL DISCUSSION 

2.1 Notion of a Sequential Test 

In the current theory of testing hypotheses the number of observa- 
tions, i.e., the size of the sample on which the test is based, is treated 
as a constant for any particular problem. An essential feature of the 
sequential test, as distinguished from the current test procedure, is 
that the number of observations required by the sequential test de- 
pends on the outcome of the observations and is, therefore, not pre- 
determined, but a random variable. 

$?he sequential method of testing a hypothesis H may be described 
as follows. A rule is given for making one of the following three deci- 
sions at any stage of the experiment (at the rath trial for each integral 
value of ra) : (1) to accept the hypothesis H , (2) to reject the hypothesis 
H, (3) to continue the experiment by making an additional observa- 
tion. Thus, such a test procedure is carried out sequentially. On the 
basis of the first observation one of the aforementioned three decisions 
is made. If the first or second decision is made, the process is termi- 
nated. If the third decision is made, a second trial is performed. 
Again, on the basis of the first two observations one of the three deci- 
sions is made. If the third decision is made, a third trial is performed, 
and so on. The process is continued until either the first or the second 
decision is made. The number n of observations required by such a 
test procedure is a random variable, since the value of n depends on 
the outcome of the observations. 

For each positive integral value ra, we shall denote by M m the to- 
tality of all possible samples (xi, • • • , x m ) of size ra. We shall also 
refer to M m ad 1 the ra-dimensional sample space. A rule for making 
one of the three decisions at any stage of the experiment can be de- 
scribed as follows. For each integral value ra, the ra-dimensional sample 
space is split into three mutually exclusive parts, R m °, R m l f and R m , 
After the first observation x x has been drawn, the hypothesis H that 
is being tested is accepted if x x lies in Ri°; H is rejected if X\ lies in 
Ri 1 ; or a second observation is made if x x lies in R x . 14 the third 
decision is made and a second observation x 2 drawn, H is accepted, 
H is rejected, or a third observation is drawn, according as the ob- 
served sample (xi, x 2 ) lies in R 2 °, R 2 l , or R 2 . If (xi, x 2 ) lies in R% 
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a third observation £3 is drawn and one of the three decisions is made 
according as (xi, x 2 , £3) lies in J R 3 °, R 3 l , or R 2 , and so on. This process 
is stopped when, and only when, either the first or the second decision 
is made. 1 Thus, a sequential test is completely defined by defining 
the sets R m ° , R m 1 y and R m for all positive integral values m. Since 
Rm°> Rm l > and R m are mutually exclusive and add up to the whole 
sample space M m , it is sufficient to define any two of the sets R m °, 
Rm 1 , and R m . Any one of the three sets Rm°, ft** 1 , and R m consists 
precisely of all those samples which are not contained in the other two. 

We shall call a sample (x\, • • • , x m ) ineffective if it contains an initial 
segment (x\, • • •, £ m '), where m' < m , such that (£1, • • •, aw) lies in 
R m *° or in Rm 1 . A sample which is not ineffective will be said to be 
an effective sample. Clearly, for a sequential test procedure we shall 
have an effective sample at any stage of the experiment. Thus, in 
defining the sets R m °, R m l , and R m we may disregard ineffective sam- 
ples. In other words, it is sufficient to state in which of the sets Rm > 
Rm 1 , and R m each effective sample (xi, * • *, x m ) should be included, 
since ineffective samples cannot? occur during the sequential process. 

The following is a simple example of a sequential test. Suppose that 
a lot consisting of a large number of units of a manufactured product 
is submitted for acceptance inspection. Each unit is classified in one 
of the two categories: defective and non-defective. The proportion p 
of defectives in the lot is unknown. The lot is considered acceptable 
if p <£ a given value p'. If p > p' we prefer to reject the lot. Thus, 
we are interested in testing the hypothesis H that p g p'. The follow- 
ing procedure of testing H is a simple example of a sequential ^test. 
Let no denote a given integer. If the first no units inspected are non- 
defective, we stop inspection and the lot is accepted ( H is accepted). 
If for some value m rg no the rath unit inspected is found defective, 
no further units are inspected and the lot is rejected ( H is rejected). 
We shall assign the value 0 to any non-defective unit and the value 1 
to any defective unit. In this example, a sample (xi, • • *, x m ) is ef- 
fective if and only if m ^ n 0 and X\ = • • • = x m -\ § = 0 . R m ° con- 
tains no effective sample for m < uq, i.e., acceptance is not possible 
for m <Uq. R h ° contains only one effective sample: ( 0 , 0 , •••, 0 ). 
For any m ^ no the set Rm 1 contains exactly one effective sample: 
( 0 , 0 , • * •, 0 , 1 ). 

The sets R m °, Rm \ and R m (m = 1, 2, ••*) defining a sequential test 
can be chosen in many ways, and a fundamental problem in the theory 
of sequential tests is that of a proper choice of these sets. To formulate 

1 We shall consider only sequential tests for which the probability is one that the 
process will eventually terminate. 
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principles for a proper choice of the sets i? m °, R m l , and it is neces- 

sary to study the consequences of any particular choice. This will be 
done in the next section. 

2.2 Consequences of the Choice of Any Particular Sequential Test 

2.2.1 The Operating Characteristic Function 

After a particular sequential test has been adopted, i.e., a particular 
choice of the sets R m ° , R m l , and R m (m = 1, 2, • • •) has been made, 
the probability that the process will terminate with the acceptance of 
the hypothesis H 0 under test depends only on the distribution of the 
random variable x under consideration. As before, it is assumed that 
the distribution of x is known except for the values of a finite number 
of parameters, 0i, • • • , 6k, say. Thus, the distribution of x is given 
by a function f(x, 6 if • • • Bk) where the functional form/ is known, but 
the true values of the parameters 0i, • • • , 0& are unknown. To simplify 
notation, we shall use the letter 0 without subscript to denote the set 
of all k parameters 0i, • • • , 0*. We shall refer to 0 as a parameter point, 
since 0 can be represented geometrically by a point with the coordi- 
nates 0i, ••*,0*. Since the distribution of x is determined by the 
parameter point 0, the probability of accepting H 0 will be a function 
of 0. This function will be denoted by L(0) and will be called the 
operating characteristic (OC) function. If there is only one unknown 
parameter 0 the function L(0) can be plotted as a curve, 0 being meas- 
ured along the horizontal axis and L(0) along the vertical axis. Since 
we shall consider only tests for which the probability that the proce- 
dure will eventually terminate is equal to 1, the probability of reject- 
ing H 0 is equal to 1 — L(0). 

The OC function is very closely related to the notion of the power 
function in the current theory of tests. For any parameter point 0 
which is not consistent with the null hypothesis Hq, the power of the 
test is defined as the probability of rejecting H 0 when 0 is the true 
point. Thus, for any 0 not consistent with H 0 the power of the test 
is equal to 1 — L(0). 

To illustrate the meaning of an OC function, we shall compute the 
OC function of the particular sequential test given as an example in 
the preceding section. In that example the only unknown parameter 
is 0 = p, where p denotes the proportion of defectives in the lot. The 
lot is accepted if, and only if, the first no units inspected are non- 
defective. The probability that the first unit inspected is non-defective 
is equal to 1 — p. On the assumption that the size of the lot is suf- 
ficiently large as compared with no, the successive observations may 
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be treated as being independent. Then the probability that all Uq 
units will be non-defective is equal to (1 - p) wo . Thus, the operating 
characteristic function is given by 

L(p) = (1 - p) no 

This function can be plotted, as shown in Fig. 4, by measuring p 
along the horizontal axis and L(p) along the vertical axis. 



j-he OC function describes what the sequential test procedure ac- 
complishes. For any parameter point 9 the probability of making a 
correct decision can be obtained immediately from the OC function. 
If the parameter point 9 is consistent with the hypothesis H 0 to be 
tested, then the probability of making a correct decision is equal to 
L(d). If the true parameter point 6 is not consistent with the hypoth- 
esis H 0} the probability of making a correct decision is equal to 
1 — L(6). Clearly, an OC function is considered more favorable the 
higher the value of L(9) for 9 consistent with Hq and the lower the 
value of L(9) for 9 not consistent with II q. 

2.2.2 The Average (Expected) Sample Number (ASN) Function of 
a Sequential Test 

We have pointed out before that the number of observations re- 
quired by a sequential test is not predetermined, but is a random vari- 
able, because at any stage of the experiment the decision to terminate 
the process depends on the results of the observations made so far. 
For example, for the particular sequential test discussed in the pre- 
ceding section, the number of observations required by the test may 
be anything from 1 to no. If no defects are found during the sampling 
process, we shall make no observations. On the other hand, if the 
first m — 1 units inspected are non-defective and the rath unit is de- 
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fective for some value m < no, then the total number of observations 
made will be equal to m. 

We shall denote by n the number of observations required by the 
sequential test. Then n is a random variable. Carrying out the same 
sequential test procedure repeatedly, we shall obtain, in general, dif- 
ferent values for n. Of particular interest is the expected value of n 
(the average value of n in the long run, when the same test procedure 
is applied repeatedly). For any given test procedure the expected 
value of n depends only on the distribution of x. Since the distribu- 
tion of x is determined by the parameter point 0, the expected value 
of n depends only on 0. For any given parameter point 0, we shall 
denote the expected value of n by E$(n). If there is only one unknown 
parameter 0 the function Eo(n) can be plotted as a curve, 0 being meas- 
ured along the horizontal axis and E e (n) along the vertical axis. We 
shall refer to the average sample number function Eo(ri) briefly as the 
ASN function. 

As an example, we shall compute the ASN function for the particular 
sequential test discussed in the preceding section. For any positive 
integral value m < n 0 , the probability that the test will be terminated 
at the mth observation is given by (1 — p) m “ 1 p* We shall inspect n 0 
units if and only if the first n 0 — 1 units are found non-defective. 
Thus, the probability that the test will require exactly n 0 observations 
is equal to (1 — p) no “\ Hence, the expected value of n is given by 

no— 1 

E p (n) -2}”P(1 “ ^ m_1 + n oC “ P)” 0_1 

TO =1 

The graph of the ASN function will be of the type shown in Fig. 5. 



An OC function and an ASN function are associated with each test 
procedure. These two functions are perhaps the most important con- 
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sequences of a test procedure. The OC function describes how well 
the test procedure achieves its objective of making correct decisions, 
and the ASN function represents the price we have to pay, in terms 
of the number of observations required for the test. Thus, in judging 
the relative merits of two different test procedures, we shall compare 
the OC and ASN functions of these two tests. 

2.3 Principles for the Selection of a Sequential Test 

2.3.1 Degree of Preference for Acceptance or Rejection of the Null 
Hypothesis H 0 as a Function of the Parameter 0 

In order to set up principles for the selection of a sequential test 
it is necessary to investigate the dependence of the preference for re- 
jection or acceptance of the null hypothesis H 0 on the parameter point 
6. Denote by co the set of all those parameter points 0 which are con- 
sistent with H 0 , i.e., H 0 is precisely the statement that the true pa- 
rameter point is included in the set co. For example, if there is only 
one unknown parameter 0 and if H 0 is the hypothesis that Q is less 
than or equal to a certain particular value 0 O , co is the set of all values 
6 for which 6 S So. Since a correct decision is preferred to a wrong 
decision, we can say that acceptance of II 0 is preferred whenever 6 is 
in co, and rejection of H G is preferred whenever S is outside co. 

The mere statement of preference for acceptance or rejection of II 0 
is not yet a sufficient guide for the selection of a proper sequential test. 
For this purpose it is necessary to know something about the degree 
of preference for acceptance or rejection as a function of the parameter 
point 6. 

We shall denote by co the set of all parameter points which lie outside 
w. A point 0 will be said to be on the boundary of co, or a boundary 
point of co, if any arbitrarily small neighborhood of 0 Contains points 
of co as well as of co. The totality of all boundary points of co wil lhe 
called^fETBoundary of co. If, for example, there is only one unknown 
parameter and xa ijrdefifiecl by 0 g 0 O , then 0 O is the only boundary point 
of co. If co is the set of all values 0 for which 9 0 g 9 ^ 9\, then both 9 0 
and 0 X are boundaiy points. If the true parameter point 9 lies in co 
but is near the boundary of co, the preference for acceptance of H 0 will, 
in general, be only slight. Similarly, if the true point 0 lies in co but 
near the boundary of co, the preference for rejection of H 0 will be only 
slight. In other words, the rejection of H 0 is not considered to be a 
serious error if 0 is in co but near the boundary. Similarly, the accept-, 
ance of H 0 is not considered a serious error if 0 is in co but near the 
boundary of co. If the true point 0 lies exactly on the boundary of co, 



28 SEQUENTIAL TEST OF A STATISTICAL HYPOTHESIS 


there will be, in general, no definite preference for one or the other 
action, i.e., it will be indifferent to us whether the hypothesis Hq is 
accepted or rejected. 

In general, it will be possible to subdivide the totality of all param- 
eter points (parameter space) into three mutually exclusive zones: a 
zone consisting of all points 0 for which acceptance of Hq is strongly 
preferred; a zone consisting of points 0 for which rejection of Hq is 
strongly preferred; and a zone consisting of all points 0 which are not 
included in either of the first two zones, i.e., the third zone consists 
of all points 0 for which neither acceptance nor rejection of Ho is 
strongly preferred. We shall refer to the first zone as the zone of 
preference for acceptance, to the second zone as the zone of preference 
for rejection, and to the third zone as the zone of indifference. The 
zone of preference for acceptance will always be a subset of co and the 
zone of preference for rejection will be a subset of w. The zone of in- 
difference will usually consist of points of co and co which are near the 
boundary or on the boundary of co. 

Although the subdivision of the parameter space into three zones as 
described above is used as a basis for the selection of a sequential test, 
it cannot be considered a statistical problem. Such a subdivision is 
made in each case on the basis of practical considerations concerning 
the consequences of a wrong decision. 

The subdivision of the parameter space into the above-mentioned 
three zones gives a somewhat sketchy picture of the degree of pref- 
erence for acceptance or rejection as a function of the parameter 0. 
A more refined description of the degree of preference for one or the 
other action can be given in terms of two functions wq(B) and wi(0), 
where w 0 (6) expresses the relative importance of, i.e., the loss caused 
by, the error of accepting H 0 when 0 is true, and (0) expresses the 
relative importance of the error of rejecting Hq when 0 is true. The 
function Wq(0) = 0 for any 0 in co, since for such points 0 the accept- 
ance of Hq is a correct decision. For any 0 in co, w 0 (6) will have a 
positive value which will, in general, increase with increasing distance 
of 0 from the boundary of co. Similarly, Wi (0) = 0 for all 0 in co and 
Wi(0) > 0 for all 0 in co. Again, W\{d) will, in general, increase with 
increasing distance of 0 from the boundary of co. Our subdivision of 
the parameter space into three zones may be interpreted as being 
equivalent to choosing the functions w 0 (8) and (0) as follows: 
Wq (0) = 0 when 0 is in the zone of preference for acceptance or in the 
zone of indifference. For any 0 in the zone of preference for rejection, 
u>o(0) has a high positive value, say c 0 , indicating that the loss caused 
by acceptance is of practical importance. Similarly, w x (0) = 0 for any 
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0 in the zone of preference for rejection or in the zone of indifference. 
For any 0 in the zone of preference for acceptance, Wi(0) has some high 
value, say ci, indicating that the loss caused by rejection of H 0 is of 
practical importance. Although a refined description of the depend- 
ence of the degree of preference for one or the other action on 0 may 
occasionally require the use of continuous functions w o (0) and W\{0) y 
the step functions implied by the subdivision of the parameter space 
into three zones will give a sufficiently good approximation for most 
practical purposes. They also have the advantage of great simplicity. 
Thus, in what follows we shall assume that the dependence of the 
preference for one or the other action on 0 is described by a subdivision 
of the parameter space into three zones of the type mentioned above. 

As an illustration, we shall discuss briefly a few examples. Consider 
first the case in which a lot consisting of a large number of units of a 
manufactured product is submitted for acceptance inspection. Assum- 
ing that the units are classified in one of the two categories, defective 
and non-defective, the preference for acceptance or rejection of the lot 
depends only on the proportion p of defectives in the lot, which is 
unknown. In this case there is only one unknown parameter 0 which 
is equal to the proportion p of defectives in the lot. It will, in general, 
be possible to select two values p 0 and pi (p 0 < Pi) such that for any 
p ^ p 0 the rejection of the lot is an error of practical importance, for 
any p ^ pi the acceptance of the lot is considered a wrong decision of 
practical importance, whereas for any value p between p 0 and p\ there 
is no strong preference for either action. Thus, the zone of indifference 
may be defined as the interval from p 0 to pi, the zone of preference for 
acceptance as the set consisting of all values p ^ po, and the zone of 
preference for rejection as the set of all values p ^ pi. 

As a second example, consider the case in which the hardness z of 
a certain product varies from unit to unit such that x may be con- 
sidered a normally distributed variable in the population of all units 
produced. Suppose that the mean value 0 of x is unknown but that 
the standard deviation of x is known. Assume that the most desir- 
able value of 0 is 0 O and that the product becomes less desirable as 
the absolute deviation | 0 — 0q | between the true mean and the most 
desirable value 0o becomes greater. Suppose that the problem is to 
decide whether the product should be put on the market or not. In 
such a case, it will, in general, be possible to find a positive value c 
such that if | 0 — 0 O | <cwe prefer to put the product on the market, 
and if | 0 — 0o | > c we prefer to withhold the product. For | 0 — 0o | 
= c, we are indifferent which action is taken. Thus, the hypothesis 
H 0 may be defined as the hypothesis that | 0 — 0o | < c. We shall not 
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define the zone of indifference by the equation | 6 — 6 0 | = c, since if 
| 0 — do | differs only slightly from c, the preference for one action over 
the other is only slight and of no practical importance. However, it will 
be possible to find a positive value A such that, if | 0 — 0 O | < c — A, 
we strongly prefer to accept IIq (to put the product on the market) 
and, if | 0 — 0 O | > c + A, we strongly prefer to reject H 0 (not to put 
the product on the market) whereas, if c — A ^ | ^ | ^ c + A, 

no strong preference is given to either action. Thus, the zone of indif- 
ference may be defined by the inequality c — A ^ | 0 — 0o | ^ c + A, 
the zone of preference for acceptance by | 0 — 0 o | < c — A, and the 
zone of preference for rejection by | 0 — 6 0 | > c + A. 

In each of the previous two examples there was only one unknown 
parameter. We shall now consider an example where there are two 
unknown parameters. Suppose that a lot consisting of a large num- 
ber of units of a manufactured product is submitted for acceptance 
inspection. Assume that the characteristic of the product in which 
we are interested is the resistance to pressure, which is a measurable 
quantity x. It is assumed that x varies from unit to unit in the lot 
and has a normal distribution with unknown mean t* and unknown 
standard deviation o. Let L be a value such that acceptance of the 
lot is strongly preferred if the proportion of units in the lot with 
resistance x ^ L does not exceed .01, rejection of the lot is strongly 



preferred if the proportion of units in the lot for which x ^ L exceeds 
.05, and no strong preference exists for either action if the proportion 
of units in the lot with x ^ L lies between .01 and .05. The propor- 
tion of units with x ^ L is greater than or equal to .05 if, and only if, 
(/x — L)/<r ^ Xi, and the proportion of such units is g .01 if, and only 
if, (ju — L)/(t ^ X 2 (Xi < X 2 ). The values Xi and X 2 can be obtained 
from a table of the normal distribution. Thus the zone of preference 
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for rejection is given by the set of all values n and a for which 
(jti — L)/<r ^ Xi, the zone of preference for acceptance is -given by 
(ju — L)/<t ^ X 2 , and the zone of indifference is given by Ai < (m — L)/<r 
< X 2 . These three zones are represented in Fig. 6, where /x is measured 
along the horizontal axis and <j along the vertical axis. The zone of 
indifference is bounded by two straight lines which go through the 
point L on the abscissa axis and have slopes 1/Ai and 1/X 2 , respectively. 

2.3.2 Requirements Imposed on the OC Function 

Suppose that the hypothesis Hq to be tested states that the true 
parameter point 0 lies in a given set co of parameter points. Then we 
wish to make the probability of accepting H 0 as high as possible when 
0 lies in co, and as low as possible when 0 is outside co. Since the prob- 
ability of accepting 7/ 0 is by definition equal to the OC function L(0), 
an OC function is considered more desirable the higher the value of 
L(0) for any 6 in co and the lower the value of L(6) for any 6 outside co. 
An ideal OC function would be given by a function L(0) such that 
L(0) = 1 for any 9 in co and L(0) = 0 for any 9 outside co. Suppose, 
for example, that there is only one unknown parameter 0 and the 
hypothesis to be tested is the statement that 6 ^ 0 O . Then, an ideal 
OC function, as shown in Fig. 7, would be given by a function L{9) 
such that L(0) = 1 for 0 g 0 O and L(0) = 0 for 0 > 0 O . 

He) I 


Example of an ideal 
OC function 


9 0 0 

Fia. 7 

The ideal form of the OC function can never be achieved on the 
basis of incomplete information about 0 supplied by a random sample 
drawn from the population, but it can be approached arbitrarily closely 
if we are willing to take a sufficiently large sample. 

The nearer the OC function is to the ideal function and the smaller 
the expected number of observations required, the more desirable is 
the sequential test. These two desirable features of a test are some- 
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what in conflict, since the closer we approach the ideal form of the 
OC function, the larger, in general, will be the number of observations 
required by the test. To achieve a compromise between these two 
conflicting desiderata, we may proceed as follows. First we formulate 
requirements concerning the closeness of the OC function to the ideal 
function and then consider only tests which satisfy these requirements. 
From these tests we try to select one for which the expected number 
of observations required by the test is as small as possible. To impose 
the desired conditions on the OC function first and then to minimize 
with respect to the expected number of observations does not seem to 
be an unreasonable procedure, since the OC function is perhaps of 
primary importance. 

To formulate requirements on the OC function, we shall make use 
of the subdivision of the parameter space into the three zones discussed 
in the preceding section. Since in the zone of indifference there is no 
strong preference for one or the other action, we shall not impose any 
conditions on the behavior of 7/(0) within the zone of indifference. In 
the zones of preference for acceptance and rejection the requirements 
on the OC function may reasonably be stated as follows. For any 9 
in the zone of preference for acceptance the probability of rejecting 
the hypothesis 77 0 , i.e., the value of 1 — 7/(0), should be less than or 
equal to a preassigned value a, and for any 0 in the zone of preference 
for rejection the probability of accepting 7/ 0 , i.e., the value of L(0), 
should be less than or equal to a preassigned value (3. 

We can summarize the requirements imposed on the OC function 
as follows. First the parameter space is subdivided into three mutually 
exclusive zones: a zone of preference for acceptance, a zone of prefer- 
ence for rejection, and a zone of indifference. Then two positive values 
a and p, both < 1, are selected. The requirements imposed on the 
OC function are then given by the two following conditions: 

(2:1) 1 — L(0) ^ a for any 9 in the zone of preference for acceptance 

(2:2) L(9) ^ p for any 9 in the zone of preference for rejection * 

Condition (2:1) can also be written as 

(2:3) L{9) ^ 1 — a for any 9 in the zone of preference for acceptance 

The subdivision of the parameter space into three zones, as well as 
the choice of the values a and £, is to be made on the basis of 
practical considerations in each particular case. We shall say that a 
sequential test is admissible if it satisfies the requirements (2:2) 
and (2:3). 



PRINCIPLES FOR THE SELECTION OF A TEST 


A typical OC function satisfying the conditions (2:2) and (2:3) is 
shown in Fig. 8, where there is only one unknown parameter 0 and the 
zone of preference for acceptance is defined by 0 S 0o> and the zone of 
preference for rejection is defined by 0 ^ 0i. (0o < 0i-) 



2.3.3 The ASN Function as a Basis for the Selection of a Sequen- 
tial Test • 

After the parameter space has been subdivided into three zones and 
the quantities a and p have been chosen, we consider only tests which 
are admissible, i.e., tests which satisfy the conditions (2:2) and (2:3). 
Clearly, we wish to select a sequential test for which the expected value 
of the number of observations required by the test is as small as pos- 
sible. This expected value Eq(u) depends, as we have seen in Section 
2.2.2, on the parameter point 0. In section 2.2.2 we referred to the 
function E$(n ) as the ASN function of the test. 

The expected value E$(n) of the number of observations to be made 
depends, of course, also on the particular sequential test used. To put 
this dependence in evidence, we shall occasionally use the symbol 
E$(n | S) to denote the value E$(n) when the sequential testS is applied. 

It is of particular interest to consider for any particular 0 the mini- 
mum 2 value of Ee(n | S ) with respect to S where S may be any admis- 
sible sequential test. This minimum value, in symbols Min E$(n | S ), 

8 

depends only on 0. Clearly, for any admissible sequential test S' we 
have 

E e (n | S’) ^ Min E 0 (n \ S) 

8 

If an admissible sequential test So exists for which the expected value 
of the number of observations is minimized for all 0, i.e., for which 

* If the minimum value does not exist, we can take the greatest lower bound with 
respect to S. 
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E$(n | So) = Min E$(n | S ) for all 6, then S 0 may be regarded as a 
8 

“uniformly best” test. In general, however, no uniformly best test 
exists, 8 i.e., it will not be possible to minimize the expected value of 
the required number of observations simultaneously for all 6. Thus, 
in such cases some compromise principle is to be adopted for the selec- 
tion of a sequential test. We do not propose to enter into a discussion 
of the various possible compromise principles that could be advanced, 
since the various possibilities have not yet been fully investigated. 
However, for the particular, but theoretically very interesting, case 
when a simple hypothesis is tested against a single alternative, the 
situation has been clarified and we shall discuss it in some detail in 
the next section. 

2.4 The Case When a Simple Hypothesis H 0 Is Tested against a 
Single Alternative H x 

2.4.1 Efficiency of a Sequential Test 

We shall consider only two values of the parameter 0 , say 0 O and 9 X . 
Let H 0 be the hypothesis that 6 - 0 O and let H x denote the hypothesis 
that 6 = 0i. We shall refer to H 0 as the null hypothesis and to H x as 
the alternative hypothesis. With any sequential test of the hypothesis 
Hq against the alternative hypothesis H x there will be associated two 
numbers a and ft between 0 and 1 such that if Ho is true the prob- 
ability is a that we shall commit an error of the first kind (we shall 
reject H 0 ), and if H x is true the probability is p that we shall commit 
an error of the second kind (we shall accept i/ 0 ). Two sequential tests 
S and S' will be said to be of equal strength if the values a and P 
associated with S are equal to the corresponding values a ' and p' as- 
sociated with S'. If a < a' and P ^ p', or if a ^ a' and P < p' , we 
shall say that S is stronger than S' (S' is weaker than S ). If a < a' 
and p > p', or if a > a' and p < we shall say that the strength of 
S is not comparable to that of S'. 

Restricting ourselves to sequential tests of a given strength (a, #), 
a test may be regarded as more desirable the smaller the expected 
number of observations required by the test. If S and S' are two 
sequential tests of equal strength such that Ee 0 (n | S ) ^ E$ 0 (n | S') and 
EtM S) < E ti (n I S'), or E 6a {n | S) < E h (n | S') and E 6l (n \ S) g 
Ee x (n S'), the test S will be considered preferable to S'. If a test 
So exists such that E^(n | So) ^ Eo Q (n | S) and Ee x (n | So) £ Ee x (n | S) 

• The situation here is similar to that in the Neyman-Pearson theory of testing 
hypotheses, where uniformly most powerful tests exist only in exceptional cases. 
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for all tests S of strength equal to that of So, we shall say that So is an 
optimum test. 

We shall v denote by no (a, 0) the minimum value of Ee 0 (n | S) with 
respect to^'s) and by 7 ii(a, 0) the minimum value of Ee v {n | S) with 
respect to S, where S may be any sequential test of strength (a, 0). 4 
Then for any sequential test S of strength (a, 0) we have E$ 0 (n \ S) 
n o (a,0) and Ee x (n | S) ^ n x (a, 0). A sequential test S of strength 
(a, 0) is an optimum test if Ee 0 (n | S ) = no (a, 0) and Ee x (n | S ) 3=3 
n t (a, 0). The existence of an optimum test has not been proved. 
However, it will be shown in Section A.7 of the Appendix that for the 
so-called sequential probability ratio test So of strength (a, p), defined 
in Chapter 3, the ratios 


(2:4) 


Ee a (n 1 <S 0 ) and Eg t (n \ >%) 
n 0 (a,P) ni(a, (3) 


can exceed 1 only by very small quantities which can be neglected for 
practical purposes. Thus, for all practical purposes, the sequential 
probability ratio test may be regarded as an optimum test. 6 In Sec- 
tion A.7 it is also shown that the ratios (2:4) converge to 1 as ap- 
proaches 0 o . 

We shall define the efficiency of a sequential test S of strength (a, 0) 

71 (ck 0) V « o\ '' 

by the ratio — 7 ■ ■ when II 0 is true 
Ee 0 (n | S) ' 

true. Clearly, the efficiency of a sequential test under Ho, as well as 
under H u lies always between 0 and 1. The greater the efficiency of 
a sequential test of a given strength the more desirable it is. An opti- 
mum test has the efficiency 1 under Ho, as well as under H\. The se- 
quential probability ratio test for testing // 0 against Hi is shown in 
Section A.7 to have an efficiency, tf not exactly, very nearly equal to 1 
under H 0 as well as under H 1. As mentioned before, in Section A.7 
it is shown that the efficiency of the sequential probability ratio test 
approaches 1 under Ho as well as under II 1, when 61 approaches 0o- 

~ 2.4.2 Efficiency of the Current Test Procedure, Viewed as a Par- 
ticular Case of a Sequential Test 

The current test procedure may be regarded as a particular case of 
a sequential test. In fact, if N denotes the fixed number of observa- 
tions used in the current procedure and if Wn denotes the critical region, 

4 If the minimum value with respect to S does not exist, we take the greatest 
lower bound. 

• The author conjectures that the sequential probability ratio test is exactly an 
optimum test, but he did not succeed in proving this. 


and by 


^1 {<*, P) 
Ee x {n | S) 


when Hi is 
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i.e., Wn is the totality of all those samples of size N for which the 
hypothesis under test is rejected, then the current procedure may be 
regarded as a sequential test defined as follows. For all positive inte- 
gral values m < N, the regions R m °, Rm 1 are the empty subsets of the 
m-dimensional sample space M my and R m = M m . For m = AT, RnI is 
equal to Wn, Rn° is equal to the totality of all samples of size AT not 
contained in Rn 1 , and Rn is the empty set. Thus, for the current test 
procedure we have Ee 0 (n) = E dl (n) = N. 

It will be shown later that the efficiency of the current test for test- 
ing Hq against H\ } based on the most powerful critical region, is rather 
low. Frequently it is below Y,. In other words, an optimum sequen- 
tial test can attain the same a and as the current most powerful test 
on the basis of an expected number of observations much smaller than 
the fixed number of observations needed for the current most powerful 
test. 

In Chapter 3 a simple sequential test procedure for testing H 0 against 
H\ will be proposed. It is called the sequential probability ratio test, 
which for practical purposes can be regarded as an optimum sequential 
test. It will be seen that these sequential tests usually lead to average 
savings of about 50 per cent in the number of trials as compared with 
the current most powerful test. 



Chapter3. THE SEQUENTIAL PROBABILITY RATIO TEST 
FOR TESTING A SIMPLE HYPOTHESIS H 0 AGAINST A SINGLE 

ALTERNATIVE H x 

3.1 Definition of the Sequential Probability Ratio Test 

Let f(xj 9) denote the distribution of the random variable x under 
consideration. 1 Let II 0 be the hypothesis that 9 = 9 0 , and Hi the hy- 
pothesis that 9 — 0\. Thus, the distribution of x is given by fix , 0 O ) 
when H 0 is true, and by f(x, 9i) when Hi is true. We shall denote the 
successive observations on x by x u x 2 , • • *, etc. 

As mentioned before, we consider only two cases: (1) x admits a 
probability density function; (2) x has a discrete distribution. It is 
our intention to cover both cases simultaneously. However, the diffi- 
culty arises that some statements will have to be formulated slightly 
differently, depending on whether x admits a density function or x has 
a discrete distribution. This difference in formulation is caused mostly 
by the fact that “probability density” in the continuous case is to be 
replaced by “probability” in the discrete case. For the sake of brevity, 
we shall occasionally use the word “probability” to mean “probability 
density” in the continuous case, if this can be done without danger of 
confusion. With this understanding it will frequently be possible to 
cover the discrete, as well as the continuous, case with a single statement. 

For any positive integral value m the probability that a sample 
Xi, • • • , x m is obtained is given by 

Plm = /(* 1, 01 ) ' ' • /Om, 0l) 

when Hi is true, and by 

POm “ fix 1 , Oo) ‘ ' ' fiXmi @o) 

when Ho is true. 

The sequential probability ratio test for testing H 0 against H\ is 
defined as follows: Two positive constants A and B (B < A) are chosen. 
At each stage of the experiment (at the wth trial for any integral 
value to), the probability ratio Pim/pom is computed. If 

(3:1) B< — <A I 

Pom '1 

1 fix j 0) denotes the probability density function of x, if a density function exists. 
If x has a discrete distribution, fix , 0) denotes the probability that the random 
variable under consideration takes the value x. 

37 
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the experiment is continued by taking an additional observation. If 


the process is terminated with the rejection of Hq (acceptance of H i). 
(33) 


V' / // 

the process is terminated with the acceptance of Hq} 

The constants A and B are to be determined so that the test will 
have the prescribed strength (a, j3). The relations among the quan- 
tities a, 0, A, and B will be discussed in the next section. 

For purposes of practical computation, it is much more convenient 
to compute the logarithm of the ratio pim/pom than the ratio p\ m /PQm 
itself. The reason for this is that log ( pim/pom ) can be written as the 
sum of m terms, i.e., 

.. . Pi m . fix 1 , Ol) f(x m , 01 ) 

(3:4) log = log- —+■••+ log- ~ 

Pom fix 1 , do) f(x m , do) 

We shall denote the zth term in this sum by i.e., 

, /(** 0 i) 


Zi = log 


f(xi, do) 


The test procedure is carried out as follows, the quantities z* (i = 
1, 2, • • •) being used: At each stage of the experiment (at the mth trial 
for each integral value of m), the cumulative sum Z\ 4 f* z m is com- 

puted. If 

(3 .*6) log B < zi 4 V z m < log A 

the experiment is continued by taking an additional observation. If 
Z\ 4 h 2m ^ log A 

the process is terminated with the rejection of H 0 . If 


z\ 4 h z m ^ log B 


the process is terminated with the acceptance of Hq. 

* If for a particular sample p\ m = pom = 0, we shall define the value of the ratio 
pim/pom as 1. If for some sample (xi, • • • , x m ) we have pi m > 0 but pom ** 0 
inequality (3:2) is considered fulfilled and Hq is rejected. 
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A few simple illustrations will help to make the procedure more con- 
crete. Suppose that the random variable x can take only two values, 
0 and 1. We shall denote the probability that x = 1 by p, the value 
of which is assumed to be unknown. Thus, p is the unknown param- 
eter of the distribution. The distribution of x is given by the function 
f(x , p) which is defined only for two values of x } namely x = 0 and 
x = I. /(I, p) = p and /(0, p) = 1 — p. Let H 0 be the hypothesis 
that p = po and Hi the hypothesis that p = pi (pi p 5 Po)- Then 


t /(*.•> Pi) , Pi 1 

2i = log == log — if Xi = 1 

ffri, Po) Po 


Hence, 

( 3 : 7 ) 


= log 


1 ~ Pi 
1 - Po 


if X{ = 0 


+ •• 


•+ z m = m* log {m — m*) log 

Po 


1 ~ Pi 
1 - Po 


where m* denotes the number of ones in the sequence of the first m 
observations. We accept i/ 0 if 


m* log — + (m — m*) log — ^ log B 

Po 1 - Po 


We reject H 0 (accept Hi) if 


Pi I — pi 

m* log — + (m — in.*) log s~ log A 

Po 1 - Po 


We continue the experiment by taking an additional observation if 


log B < m* log — + (m — m*) log — < log A 

Po 1 - Po 


^The expression (3:7) can, of course, be obtained cumulatively. If an 
observation is a one, the constant log (pi/po) is added to the preceding 
value of (3:7) to obtain the new value. If the observation is a zero, 
the constant log (1 — pi)/(l — Po) is added. 

As a second example, consider the problem of testing a hypothtesis 
about the mean of a normal distribution. Let x be a normally dis- 
tributed random variable with unknown mean 6 and unit variance. 
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Let Ho be the hypothesis that 0 = 6 0 and Hi the hypothesis that 
0 = 0i. Then 

and 


Hence, 


and 


If 


log 


Zi = Iog fi ^ = (01 - 0oK' + ~ (0O 2 - 01 2 ) 

/(a;,-, 0o) 2 

m 

- — = Z\ + • • • + z m = (0i — 0o) ^ + ~ (0o 2 — 0i 2 ) 

POm % = i " 

m 

E m 

Xi + — (00 - 01 ) ^ log A 


the process is terminated with the rejection of Hq . If 

m 

(01 - 6b)2> + 77 (0o 2 - 01 2 ) ^ log B 

l 2 

the process is terminated with the acceptance of H 0 . If 

771 

log B < ( 6 i — 0 O ) + 77 (0o 2 ~ 0i 2 ) < log A 

l 2 

the experiment is continued by taking an additional observation. 
Again, log ( Pim/Pom ) can be computed cumulatively if after each ob- 
servation Xi we compute (0i — 6 0 )xi + 3^ (0o 2 — 0i 2 ) and add it to the 
preceding value of log (pi m /p 0 m). 


3.2 Fundamental Relations among the Quantities a, p, A, and B 

In this section we shall derive certain inequalities satisfied by the 
quantities a, (I, A, and B which will provide the basis for determining 
the constants A and B in the sequential probability ratio test. 

We shall say a sample (ii, • • x n ) is of type 0 if 


B < — - — — ^ Zm ’ — 

POm /(*!, 0 O ) • • • f(Xm, 0<)) 


< A for m = 


1 , 1 


Pin 


£ B 


POn 


and 
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Similarly, we shall say a sample (xj, • • •, x n ) is of type 1 if 


B < 


and 


Plm = /fa, ftl) • • • /(-Cm, gl) 
POr» /fa, « 0 ) ‘ • /fan, <?o) 


< A for m = 1, 


, n 


P -^±A 

POn 


Thus, a sample of type 0 leads to the acceptance of Ho and a sample 
of type 1 leads to the acceptance of H i (rejection of H 0 ). 

Clearly, for any given sample (sq, • • , x n ) of type 1 the probability 
of obtaining such a sample is at least A times as large under hypothesis 
H i as under hypothesis H 0 . Thus, the probability measure of the 
totality of all samples of type 1 is also at least A times as large under 
H x as under H 0 . The probability measure of the totality of all samples 
of type 1 is the same as the probability that the sequential process will 
terminate with the acceptance of H x (rejection of H 0 ). But the latter 
probability is equal to a when H 0 is true and to 1 — iS when Hi is 
true. 3 Thus, we obtain the inequality 


(3:8) 


1 - 0 ^ Aa 


This inequality can be written as 

1 - 0 

(3:9) A ^ 

a 

Thus, (1 — 0)/a is an upper limit for A. 

A lower limit for B can be derived in a similar way. In fact, for 
any given sample (aq, • • •, x n ) of type 0 the probability of obtaining 
such a sample under Hi is at most B times as large as the probability 
of obtaining such a sample when // 0 is true. Thus, also the probability 
of accepting H 0 is at most B times as large when Hi is true as when 
H 0 is true. Since the probability of accepting is 1 — ac when Hq 
is true and 0 when H x is true, we obtain the inequality 

(3:10) 0 g (1 - a)B 

This inequality can be written as 

(3:11) B £ -A_ 

1 — a 

Thus, 0/(1 — a) is a lower limit for 2?. 

3 The probability that Ho will be accepted when Hi is true is by definition equal 
to 0. Section A. 1 of the Appendix shows that the probability is one that the sequen- 
tial process will eventually terminate. Thus, the probability that Ho will be rejected 
when Hi is true must be equal to 1 — 0. 
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Inequalities (3:8) and (3:10) can also be written as 


(3:12) 

< - 
1-/3 A 

and 

P 

^ B 

1 - a 

(3:13) 


These inequalities are of considerable value in practical applications, 
since they furnish upper limits for a and 0 for given values of A and 
B . For example, it follows from these inequalities that 


1 


(3:14) 

a < — 
“ A 

and 


(3:15) 

0 ^ B 


It may be of interest to represent graphically the totality of all 
pairs (a, 0) which satisfy the inequalities (3:12) and (3:13). Any pair 


P 



(a, 0) can be represented by a point in the plane with abscissa a and 
ordinate 0. Consider the straight lines L\ and L 2 in the plane given 
by the equations 

(3:16) aA = 1-/3 

and 

(3:17) 0 = 5(1 — a) 

respectively. The line L\ intersects the abscissa axis at a. = (1/A) 
and the ordinate axis at 0 = 1 . Similarly, the line L 2 intersects the 
abscissa axis at a = 1 and the ordinate axis at 0 = B. The region 



RELATIONS AMONG THE QUANTITIES a, ft A, AND B 


43 


consisting of all points (a, 0) which satisfy the inequalities (3:12) and 
(3:13) is the interior and the boundary of the quadrilateral determined 
by the lines L\, L 2 , and the coordinate axes. This region is shown by 
the shaded area in Fig. 9. 

The inequalities (3:12) and (3:13) have been derived under the as- 
sumption that the successive observations x\,x 2j -'^etc., are inde- 
pendent observations on x . The assumption of the independence of 
the observations has been used in showing that the probability is one 
that the sequential process will eventually terminate. 4 The rest of the 
derivation, however, remains valid also when the successive observa- 
tions are dependent, i.e., when the conditional distribution of the tth 
observation Xi is affected by the outcome of the preceding observations 
Xi, • • •, Xi_ i. If the successive observations are not independent, the 
probability that a sample (xi, • • •, x m ) will be obtained, i.e., the joint 
distribution of {x\, • • •, x m ), is no longer given by the product 
f(xi, 0)f(x 2 , 0) ••• f(x my 0), but by a more general function p m (x i, • • • , x m ). 
Thus, in dealing with dependent observations, the null hypoth- 
esis H 0 will be the statement that the distribution of the sample 
(xi, • • *, x m ) is given by some function po m (x i, • • •, x m ), and the alter- 
native hypothesis Hi will be the statement that this distribution is 
given by some other function Pi m (xi, • • *, x m ). We can construct the 
sequential probability ratio test for testing Ho against H\ in the same 
way as for independent observations. That is to say, we select two 
constants A and B (B < A) and continue taking observations as long 

as B < < A. The first time that the probability 

P0m(%ly * * * y % m ) 

ratio Pim/Pom ^ A or ^ B } we terminate the sequential process. H 0 
is accepted if p\ m /pom S B and Hi is accepted if pimfpom A. The 
fundamental inequalities (3:12) and (3:13) remain valid for such a test 
procedure in spite of the dependence of the successive observations, 
provided that the probability is one that the procedure will eventually 
terminate. It can be shown that for a very general class of joint dis- 
tributions pom(xi, • • x m ) and pi m (xi , • • •, x m ) the probability is one 
that the procedure will eventually terminate. Thus, the validity of 
the inequalities (3:12) and (3:13) is by no means restricted to the case 
of independent observations. They are generally valid also for de- 
pendent observations. 

A simple case of dependent observations arises when we sample from 
a finite population. Suppose, for example, that a lot consisting of N 
units of a manufactured product is submitted for acceptance inspection. 

4 See Section A.1 in the Appendix. 
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Let D be the number of defectives in the lot, which is assumed to be 
unknown. To each defective unit we assign the value 1 and to each 
non-defective unit the value 0. Then the distribution of a single ob- 
servation x is given by f(x y p) where /(l, p) = p , /( 0, p) = 1 - p, and 
p = D/N, The successive observations are, however, not independ- 
ent. For example, if X\ = 1, the distribution of X 2 is given by 


while if X\ =0, the distribution of X 2 is given by 


If we denote by d, the number of defectives (the 


num- 


ber of ones) in the set of the first i observations x\, ■ ■ • , x it the joint 
distribution of ( X\ , • • •, x m ) is given by 6 


(3:18) 
Pm =/ 



D - di 
N - 1 



* 3 , 


D -d 2 
N — 2 



x 


m ) 


D - dm — 1 \ 
N — m + 1/ 


Suppose that the hypothesis H 0 is that D is equal to some specified 
value D n , and II i is the hypothesis that D is equal to some value 
(Dj > D 0 ). Then the distribution of (x lt ■ ■ ■, x m ) under H 0 is given by 


(3:19) Pom - f(x u N ^f(x 2 , Ar _ 1 ) 
and the distribution under H i by 
(3:20) p lm = / (x u / (x 2} ~^y) 


f[x n 


f[Xn 


Do dm—l \ 
N — m + 1/ 

^1 dm — 1 \ 
N — m + 1/ 


The sequential probability ratio test for testing H 0 against II x is based 
on the ratio P\ m /po m . Inspection continues as long as B < p\ m /po m 
< A. The lot is accepted if p\ m /pom ^ B and the lot is rejected if 
Pim/po m ^ A. The fundamental inequalities (3:12) and (3:13) remain 
valid for this test procedure in spite of the dependence of the obser- 
vations. 


3.3 Determination of the Constants A and B in Practice 

Suppose that we wish to have a test procedure of strength (a, /3). 
Then our problem is to determine the constants A and B such that 
the resulting test will have the desired strength (a, fi). Let us denote 
by A (a, P) and B (a, ft) the values of A and B , respectively, for which 
the test has the required strength (a, 0) . The exact determination of 

5 This formula is valid as long as <2 m _i ^ D . 
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the values A (a, fi) and B(a, 0) is usually very laborious. 6 However, 
the fundamental inequalities derived in the preceding section permit 
an approximate determination of A and B which will suffice for most 
practical purposes. From (3:9) and (3:11) it follows that 


(3:21) 

and 


A (a, 0) 


<; 


1 -0 

a 


( 3 : 22 ) 


B(a, 0) ^ 


1 - 


We shall propose to put A = (1 — 0)/a = a(a, 0) y say, and B = 
0/(1 — a) = b(a, 0), say, and we shall investigate the consequences 
of this determination of A and B. From (3:21) and (3:22) it follows 
that the value a(a, 0) chosen for A is greater than or equal to the 
exact value A (a, (3), and the value b(a , 0) chosen for B is less than or 
equal to the exact value B(oc, 0 ). Then, letting A = a(a, 0) instead 
of A (a, 1 3) and B = b(a , 13) instead of B(a, 0) will, in general, change 
the probabilities of errors of the first and second kinds. If A were 
put equal to a value greater than A (a, (3) , and if B were put equal 
to B(a, 0) f then the resulting probability of an error of the first kind 
would be less than a, but the probability of an error of the second kind 
would be slightly larger than 0 Similarly, if we were to use the exact 
value A (a, 0) for A, but a value B below the exact value B(a, 0), the 
resulting probability of an error of the second kind would be less than 
f3 , and the probability of an error of the first kind would be slightly 
greater than a. Thus, if a value A is used which is higher than the 
exact value A (a, /3) and a value B is used which is lower than the 
exact value B{a y 0), it is not clear what the resulting effect on the 
probabilities of errors of the first and second kinds will be. Let us 
denote by a! and 0 the resulting probabilities of errors of the first and 
second kinds when A = a(a y 0) and B = b(<x,p). From (3:12) and 
(3:13) it follows that 


(3:23) 

and 

(3:24) 


1 


1 - 0 < a(a, 0 


a 

l-P 


0 

1 — a 


0 

1 - 


a 


9 The results in Section A. 4 of the Appendix can be used for deriving arbitrarily 
close approximations to the values A (a, p) and B(a t p). 
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From these inequalities it follows that 

(3:25) 
and 
(3:26) 

Multiplying (3:23) by (1 - p)(l - p') and (3:24) by (1 - a) (l - a') 
and adding the two resulting inequalities, we obtain 

(3:27) a' + P' ^a+P 

Inequalities (3:25), (3:26), and (3:27) give valuable upper limits for 
a! and P' . The values a and p will usually be small in practical appli- 
cations. Most frequently they will lie in the range from .01 to .05. 
Thus, a/ (1 — p) and p/( 1 — a) will be very nearly equal to a and p, 
respectively. It follows then from (3:25) and (3:26) that the amount 
by which «' may exceed a, or p' may exceed P is very small and can 
be neglected for all practical purposes. Moreover, (3:27) shows that 
at least one of the inequalities a ^ a and P' S P must hold exactly. 
In other words, by using a (a, P) and b(a, p) instead of A (a, p) and 
B(a y P), respectively, at most one of the probabilities a and p may be 
increased. 

Thus, we may conclude: The use of a(a, P) and b(a, P) instead of 
A (a, p) and B(a, p), respectively , cannot result in any appreciable in- 
crease in the value of either a or p. In other words, for all practical pur- 
poses the test corresponding to A — a (a, p) and B = b (a, P) provides at 
least the same protection against wrong decisions as the test corresponding 
to A — A (a, P) and B = B(a, p). 

Our discussion so far leaves still open the possibility that the use 
of a(a y 0) and b{a y P) instead of A (a, P) and B (a, P) f respectively, may 
result in an appreciable decrease of a , or p, or both. If this were so, 
it would mean only that the test corresponding to A = a(a, p) and 
B = 6 (a, P) would provide a better protection against wrong decisions 
than the test corresponding to A = A (a, p) and B = B(a y p). Thus, 
the only disadvantage that may arise from using a(a, P) and b (a, p) 
instead of A (a, p) and B(a , p) y respectively, is that it may result in 
an appreciable increase in the number of observations required by the 
test. In fact, since a(a y p) ^ A (a, P) and b(a, p) ^ B(a y p) y the num- 
ber of observations required by the test corresponding to A = a(a y p) 
and B = b(a, p) can never be smaller than the number of observations 
required by the test corresponding to A = A (a, p) and B = B(a, P). 
Thus, if the increase in the necessary number of observations caused 


P' ^ 


1 -P 
P 

(1 - a) 
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by the use of o(a, 0) and b(a } 0) instead of A (a, 0) and B(a, 0) can 
be shown to be only slight and of no practical consequence, the test 
corresponding to A = a(a, 0) and B = b(a, 0) serves the purpose just 
as well, and the determination of the exact values A (a, 0) and B(a, 0) 
is of little interest. 

We shall now indicate the reasons why the increase in the necessary 
number of observations caused by the use of a (a, 0) and 6 (a, 0) instead 
of the exact values A (a, 0) and B(a, 0) will generally be only slight. 7 
The reason that (3:21) and (3:22) are inequalities instead of equalities 
is that the sequential process may terminate with pi m /pom > A or 
Vim/Pom < B. If at the final stage pim/pom were exactly equal to A 
or B y then A (a, 0) and B(a } 0) would be exactly equal to (1 — 0)/a 
and 0/(\ — a)y respectively. On the other hand, a possible excess of 
Pim/Pom over the boundaries A and B at the termination of the test 
procedure is caused only by the discontinuity of the number of obser- 
vations, i.e., by the fact that the number of observations can take only 
integral values. Thus, if fractional observations were possible, i.e., if 
the number m of observations were a continuous variable, pim/pom 
would also be a continuous function of m and consequently A(a } 0) 
and B(cty 0) would be exactly equal to a(a, 0 ) and b(a, 0), respectively. 
That the increase in the necessary number of trials caused by the use 
of a(a , 0) and b (a, 0) will generally be slight is strongly indicated by 
the fact that the discrepancy between A (a, 0) and a(a r 0), as well as 
that between B(ot y 0) and 6 (a, 0), arises only from the discontinuity 
3f the number of observations. In Section 3.9 we give upper estimates 
of the increase in the expected number of trials caused by the use of 
i (a, 0) and b(a } 0). Numerical computations given in that section 
show that the increase is slight. It may be added that the nearer the 
listribution/(a;, 0i) is to the distribution /(#, 0 O ) the smaller will be this 
ncrease in the expected number of trials. The reason for this is that 
he nearer f(x, 6 X ) is to f(x, 0o), the smaller the expected excess of 
vim/pom over the boundaries A and B and, therefore, also the smaller 
he discrepancy between A (a, 0) and a (a, 0) as well as that between 
3 (a, 0) and &(«, /3). If f(x y 6 X ) approaches f(x y 0 O ) the exact values 
1 (a, 0) and B(a y 0) converge to a (a, 0) and 6 (a, 0), respectively. 

Hence, if experimentation is not excessively costly, for all practical 
mrposes the following procedure may be adopted: If a sequential test 
s desired such that the probability of an error of the first kind does not 
xceed a , and the probability of an error of the second kind does not exceed 
\ f put A = (1 — 0)/a and B = 0/(1 — a) and carry out the sequential 
rrobabUity ratio test as defined by the inequalities (3:1), (3:2), and (3:3). 

7 For a more complete discussion see Section 3.9. 
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The fact that for practical purposes we may put A = a(a , 0) and 
B =s b(a } 0) brings out a surprising feature of the sequential test as 
compared with current tests. Whereas current tests cannot be carried 
out without finding the probability distribution of the statistic on 
which the test is based, there are no distribution problems in carrying 
out a sequential test. In fact, a(a, 0) and b(a , 0) depend on a and 
only, and the ratio Pi m /Pom can be calculated from the data of the 
problem without solving any distribution problems. Distribution 
problems arise in connection with the sequential process only if it is 
desired to find the probability distribution of the number of trials 
necessary for reaching a final decision. But this is of secondary im- 
portance as long as we know that the sequential test on the average 
leads to a saving in the number of observations. 


3.4 The OC Function of the Sequential Probability Ratio Test 8 


Since the sequential probability ratio test for testing the hypothesis 
H 0 against the hypothesis Hi will be applied to problems when the 
parameter 0 can take values ^ 6 0 and ^ 0 1? it is of interest to derive 
the whole operating characteristic function L(6) of the test. For con- 
venience, we shall treat the case of a single unknown parameter 6 in 
this section and in Section 3.5. The results can be extended without 
difficulty to any number of parameters. In Section 2.2.1, L(d) has 
been defined as the probability that the sequential process will termi- 
nate with the acceptance of Hq when 6 is the true value of the param- 
eter. In this section we shall indicate the derivation of an approxi- 
mation formula for L(6 ), neglecting the excess of pi m /pom over the 
boundaries A and B at the termination of the process. A rigorous 
derivation (using a different method) together with upper and lower 
limits for the OC function will be given in Section A.2.3 of the Appendix. 

Consider the expression 



. For each value 0, the value of h(6) is determined so that h(6) ^ 0 
and the expected value of the expression (3:28) is equal to 1, i.e., 


(3:29a) 


f +* [ fix, QiY 

J-oo L/(z, 0 O )- 


h(6) 

f(x , 8) dx = 1 


8 As mentioned in the Introduction, the operating characteristic function for the 
special case of a binomial distribution was found by Milton Friedman and George 
W, Brown independently of each other, and slightly earlier by C. M. Stockman in 
England. The derivation of the OC function in the general case is due to the author. 
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if f(x , 6) is the probability density function, or 


(3:296) 


y' r /fa 0i) 

x f(x, do) 


fix, 6) = 1 


if x has a discrete distribution (the summation is taken over all pos- 
sible values of x). It is shown in Section A. 2.1 'of the Appendix that 
under some slight restriction on the nature of the distribution function 
fix, 0), there exists exactly one value hid) 0 such that (3 :29) is fulfilled. 

Hence, for any given value 9, the function of x given by 


fix, 

c,°) = 7— 

l fix, 


fix, di)f w 


fix, e ) 


is a distribution function. 

Since h(9) 0, there are two possibilities: h(9) > 0 or h(9) < 0. We 

shall first consider the case when h(9) > 0. 

Let H denote the hypothesis that f(x 7 9) is the true distribution of 
x and H* the hypothesis that f*(x, 9) is the true distribution of x. 
Consider the sequential probability ratio test S* for testing H against 
H* defined as follows: Continue taking observations as long as 


gh(0) < 


I*(X l, A) * * • 0) 

J(xi, 6) • • • f{x my 9) 


<A ™ 


J(xi, 6) • • • f{x my 9) 

Accept the hypothesis II if 

(332, /-(x.,*) •••/•(»-,«) sbKB 

fix 1, 6) fix m , d) 

Reject the hypothesis II (accept II*) if 

(333) /*(*■,»-/»(»., »> sxW) 

fix 1, d) • • • f(x m , d) 

Since .... 

(334, *** - [M " 

f(x, 9) lf(x, 0 O )J 

and since h(9) > 0, the inequalities (3:31), (3:32), and (3:33) are 
equivalent to 

, 335 , »■>••• /(»■■«.) <A 

fix 1, 9 0 ) ••• f(x m , 9q) 

(3:36) KXU 6l) B 

fix I, do) fix m , do) 

and 

(337) /(*„*,)•••/(«-,»■) 


(3:33) 

Since 

(3:34) 


r/(Mi)i* 
L fix, 0o)J 


fix 1 , 0l) • 

■ ■ fix m , dl) 

fix t, d 0 ) • 

* * fix m y 9 q) 

fix 1 , di) • 

■ • fix m , dl) 

fix 1 , do) • 

• • fiXm, do) 

fix i, di) • 

• ‘ fiXm, dl) 

fix i, d 0 ) • 

* * f{x m , 9 q) 
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But these inequalities are identical with those defining the sequential 
probability ratio test S for testing II o against II i, when the constants 
A and B are used. Thus, if the test S* leads to the acceptance of H, 
the test S leads to the acceptance of Ho, and if S* leads to the rejection 
of II, then S also leads to the rejection of Ho- From this, it follows 
that the probability of accepting H 0 when 6 is true, i.e., the value of 
L(6), is the same as the probability that the test S* will lead to the 
acceptance of H when f(x, 9) is the true distribution of x. 

To calculate the latter probability we shall apply the formulas (3:9) 
and (3:11) to the test procedure S*. Denote by a! the probability that 
S* will lead to the rejection of H when II is true, and by /S' the prob- 
ability that S* leads to the acceptance of II when II* is true. Apply- 
ing the formulas (3:9) and (3:11) to the test procedure S* we obtain 

(3:38) A m 

and 

(3:39) B m 


1 


1 - «' 


(3:40) 

and 

(3:41) 




j jaw 


1 - 


P 

1 - < 


-A 


When the excess over the boundaries at the termination of the proc- 
ess is neglected, the equality sign holds in (3 :38) and (3 :39), that is, 9 

‘,-4' 

A 

Ae 

\ 

S> 

i 

r 

V* 


From (3 :40) and (3 :41) we obtain 


(3:42) a! - 

Since a' - 1 — L(6), we get 
(3:43) L(d) 


1 - B m 
_ £*(« 

A m _ x 

~ _ gw) 


The case h{9) < 0 can be treated in a similar way. We obtain the 
same result, i.e., the approximation formula (3:43) remains valid also 
when h{6) < 0. 


* The symbol ^ indicates an approximate equality. 
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It is interesting to note that 6(0 O ) = 1 and h(6 x ) = — 1. This fol- 
lows easily from (3:296). 

As an illustration, we shall determine L(0) for the binomial case when 
x cantake only the values 0 and 1 and the distribution /(x, 6) is given 
as follows :/(l, 6) — 6 and/(0, 0) = 1 — 0. Then equation (3:296) can 
be written as 

/0A m (\ - d x \ m 

»*> %) +(1 -nr^) - 1 


To plot the OC function, it is not necessary to solve equation (3 :44) 
with respect to 6(0). We may consider 6 = 6(0) a parameter and solve 
(3:44) with respect to 0. Then- we obtain 


(3:45) 



If we let A = (1 — p)/a and B = 0/(1 — a), (3:43) can be written as 

1 

\ « / 

(3:46) L(0) 


(^y-(^y 


For any arbitrarily chosen value 6, the point [0, L(0)], computed from 
(3:45) and (3:46), will be a point on the OC function. The OC func- 
tion can be drawn by plotting a sufficiently large number of points 
[0, L(0)] corresponding to various values of 6. 

A typical OC function for the binomial case is shown in Fig. 10. 



0 


$ 
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n We ghall n o w compute L(0) when x is normally distributed with un- 
known mean 0 and known variance a 2 . In this case we have 




f(x y 0) — j— e 

V 2x0* 


The quantity h(6) is the non-zero root of the equation 

me) 


(3:47) 


c +m 1 ,-&«•-*>* 

' C -2V* 

J 7r<r 



(lx =55 1 


Evaluating the above integral and solving the equation with respect 
to,A(0), we obtain 

0i 4" Oq — 20 

(3:48) h(9) = - - 

01 — 00 


An approximation to the OC function is obtained from (3:43) by sub- 
stituting (0i + 0 O — 20)/(0i — 0 O ) for h(0). 


3.5 The ASN Function of a Sequential Probability Ratio Test 

Let n denote the number of observations required by the test and 
let Ee(n) be the expected value of n when 0 is the true value of the 
parameter. This expected value Ee(n) is a function of 0 which we have 
called the average sample number function, or briefly the ASN func- 
tion. In this section we shall outline the derivation of an approxima- 
tion formula for the ASN function, neglecting the excess of pi m /po m 
over the boundaries A and B at the termination of the sequential 
process. A more complete discussion together with upper and lower 
limits for the ASN function is given in Section A.3 of the Appendix. 

Let N be an integer sufficiently large to allow the probability that 
n ^ N to be neglected. 10 Thus we shall assume that n < N. Then 
we can write 


(3:49) 

where 

(3:50) 


A h 3 jv = (*i + • 


z* = log 


+ Z n ) + (2n+l + ’ 

ffray $l) 

f{x a) 0 O ) 


A -2j\t) 


10 It is shown in Section A.3.1 that no error is involved in assuming this, since we 
pass to the limit when N approaches °o. 
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Taking expected values on both sides of (3:49), we obtain 

(3:51) NE(z) = E(zi-\- f z n ) + E(z n + i H b 2 #) 

where 


(3:52) 


log 


f(x, 60 


/(*, e 0 ) 

Since, for a > n, the random variable z a is distributed independently 

of n, the expected value of z n + 1 H b zn is equal to the expected 

value of ( N — n) times the expected value of a single z , i.e., 

(3:53) E(z n+l H b z N ) — E(N — n)£(z) = #£(*) - E(n)E(z). 

\ 

From (3 :51) and (3 :53) it follows that 

(3:54) E(zi -\ +z„ ) - E(n)E(z ) = 0 


Hence 

(3:55) 


E(n) = 


E(z\ H +z n ) 

m 


if E{z) * 0. 

If 6 is the true value of the parameter, then E(n) = Ee(n) by the 
definition of the symbol Ee(n). We shall denote by E$(z ) the expected 
value E{z) of z when 6 is the true value of the parameter. If the excess 
of the probability ratio pi m /Pom over the boundaries A and B at the 
termination of the sequential process is neglected, the random variable 

(zj H b z n) can take only the values log A and log B with the 

probabilities 1 — L(0) and L(d), respectively. Hence 


(3 :56) E( Zl + ■ ■ ■ + z„) ~ L(6) log B + [ 1 — L(d)] log A 


From (3 :55) and (3 :56) we obtain the approximation formula 


(3:57) 


Ee(n) 


L(0) log B + [1 — L(0)] log A 
E e (z ) 


' In the preceding section we have computed explicitly the formula 
L(6) for the binomial and normal case. Thus, to obtain the explicit 
formula for E$(n), we need only compute Eg (z) . In the binomial case, 
i.e., when f(x, 6) = 6 for x = 1 and fix, 6) = I — 8 for x = 0, we have 


(3:58) 


E$(z) = Eg 



fll /(Ml) , , f 

log /(Mi) + a 


0) log 


m 0i) 
m do) 


01 

0 log -- + (1 


6) log 


l-9i 
1 -0 O 
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In the normal case, i.e., when 

Six, e) = 


_ 1 ._ -h {x ~ 6) 

'\Z2tT0 


we have 

fix, 0i) 1 . 

(3 :59) e = log -^~ = [2(0x - 8 0 )x + 0 O 2 - 0i 2 ] 

f(x, d 0 ) 2<r 2 

Hence, 

(3:60) Eeiz) = [2(6, - d Q )d + 0o 2 - 0i 2 ] 

2 < 7 2 


3.6 Saving in the Number of Observations Effected by the Use of 
the Sequential Probability Ratio Test instead of the Current 
Test Procedure 

In this section we shall assume that H 0 is the hypothesis that the 
random variable x under consideration is normally distributed with 
mean 0 O and variance unity, while II, is the hypothesis that x is nor- 
mally distributed with mean 0, and variance unity. We may assume 
without loss of generality that 0 O < 0i* We shall compare the ex- 
pected number of observations required by the sequential probability 
ratio test of strength (a, 0) for testing H 0 against H, with the fixed 
number of observations needed for the current most powerful test to 
attain the same strength (a, 0). 

We shall denote by n(a, 0) the fixed number of observations re- 
quired by the current test to attain the strength ( a , 0). The current 
most powerful test procedure for testing H 0 against H, is carried out 
as follows. The hypothesis H 0 is accepted if the arithmetic mean x of 
the observations x, } • • (the number n of observations is deter- 
mined in advance) is less than or equal to a preassigned constant d , 
and Ho is rejected ( H , is accepted) if x exceeds d. The constant d 
and the fixed number n of observations are to be determined so that 
the test will have the required strength (a, 0). For any given n and d 
the corresponding strength of the test can be determined as follows. 
Since x d is equivalent to the inequality \fn(x — 0o) ^ J '/n(d i — 0o), 
the probability that x g d is the same as the probability that 
•y/n(x — Oq) ^ y/n(d — 0o). The random variable y = y/n(x — 0 O ) 
is normally distributed with mean 0 and variance unity if H 0 is true. 
Thus, the probability that x g d when H 0 is true, i.e., the probability 
that we shall accept Ho when Hq is true, is equal to the probability 
that y g y/n(d — 0 O ). We shall denote by G(t) the probability that 
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a normally distributed random variable with mean 0 and variance 
unity will take a value less than t , i.e., 


(3:61) 


0(0 = 



e 2 dx 


Then the probability that we shall accept Ho when Ho is true is equal 
to GWn(d - do)]. Since the probability that we shall accept H 0 when 
Ho is true is 1 — a by definition, we have 

(3:62) G[y/n(d - 0 O )] = 1 - a 

To determine the value of P corresponding to given n and d , we shall 
write the inequality x ^ d in the equivalent form \/n(x — 0i) g 
Vn(d — 0i) . By definition, p is the probability that we shall accept 
Ho when H\ is true. But the latter probability is the same as the 
probability that x ^ d, i.e., that \^n(x — 0i) ^ \/n(d — 0i), when Hi 
is true. But when Hi is true this probability is equal to G[\/n(d — 0i)]. 
Thus, we have 

(3:63) G[\/n(d — 0i)] = p - 


Hence, to obtain a test of the required strength (a, p), we have to 
choose the quantities n and d so that equations (3:62) and (3:63) are 
fulfilled. Let X 0 be the value for which (7(X 0 ) = 1 — a and let Xi be 
the value for which G(\ i) = P . The values X 0 and Xi can be obtained 
from a table of the normal distribution. Then equations (3 :62) and 
(3 :63) can be written as 

(3:64) Vn(d - 0 O ) = Xo 

and 

(3:65) Vnid-Oj = h 

Subtracting equation (3 :64) from equation (3 :65) we obtain 


(3:66) \/n(0 o — 0i) = Xi — Xq 


Thus, 

(3:67) 


n = n(a, 0) 


(Xi — Xq ) 2 

(0o “ 0i) 2 


If this expression is not an integer, n(a, p) is the smallest integer in 
excess. 

We shall now determine the expected number of observations re- 
quired by the sequential probability ratio test of strength (a, p) and 
we shall compare it with the fixed number n(a, p) of observations re- 
quired by the current test as given in formula (3:67). In the sequen- 
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tial test we shall use the approximation formulas for A and B, i.e., 
we shall let A and B equal (1 - 0 )/« and 0/(1 - a), respectively, in- 
stead of the exact values A(a, 0) and B(a, 0), respectively. It has 
been shown in Section 3.2 that (1 — 0)/a § A (a, 0) and 0/(1 — a) 
g B(a, 0). Thus, by letting A = (1 — 0)/a and B — 0/(1 — a) in- 
stead of using the exact values A (a, 0) and B (a, 0), we can only in- 
crease the number of observations required by the sequential test. 
Consequently, the saving effected by the sequential test of strength 
(a, 0) as compared with the current test cannot be smaller than the 
saving which results from the sequential test obtained by using the 
approximation formulas A = (1 — 0)/<* and B = 0/(1 — a). 

We shall assume that | &! — 9 0 1 is small so that the approximation 
formula (3:57) for the expected value of n can be used. Since L(0 0 ) 
= 1 — a and L(Qi) = 0, we obtain from (3:57) 



E i(») 

0\ogB + (1 -0) 

log A 

(3:68) 

Edz) 


and 





E o(n) 

(1 — a) log B + a. 

log A 

(3:69) 

E 0 (z) 



where Ei(ri) denotes the expected value of n when H, is true (i = 0, 1). 
As can easily be verified, 


(3:70) 

Ei(x) = %(0 0 - «i) 2 

and 


(3:71) 

E 0 (z) = ~ ^i) 2 


From (3:67), (3:68), (3:69), (3:70), and (3:71) we obtain 


(3:72) 


n(a, 0) (Ai — Xo)‘ 


[0 log B + (1 — 0) log A] 


(3:73) 


n(a, 0) (\i — Ao) 2 


[ — (1 — a) log B — a log A] 


Ei(n) E 0 (n) . 

It is interesting to note that the ratios — and — - — — are mde- 

n(a, 0) n(a, 0) 

pendent of the parameter values S 0 and $i. The average saving of the 

T Ei(n) I 

sequential test as compared with the current test is 100 |^1 - - ■ j 


per cent if // 1 is true, and 100 


\ Eo(n) 1 
n(a, 0)J 


per cent if is true. 
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r Bi(n) 1 

In Table 1, panel A shows the value of 100 ^1 — ~J , and panel B 

r gp(n) 1 
L n(a, 0) J 9 


shows the value of 100 


for several values of a and 0. 


Because of the symmetry of the normal distribution, panel B is ob- 
tained from panel A simply by interchanging a and 0. 


TABLE 1 

Average Percentage Saving in Size op Sample with Sequential Analysis, 
as Compared with Current Most Powerful Test for Testing Mean 
of a Normally Distributed Variate 


A. When alternative hypothesis is true: 


\ - 






»\ 

.01 

.02 

.03 

.04 

.05 

.01 

58 

60 

61 

62 

63 

.02 

54 

56 

57 

58 

59 

.03 

51 

53 

54 

55 

55 

.04 

49 

50 

51 

52 

53 

.05 

47 

49 

50 

50 

51 

B. 

When null hypothesis is true: 

\ « 

.01 

.02 

.03 

.04 

.05 

0 \ 






.01 

58 

54 

51 

49 

47 

.02 

60 

56 

53 

50 

49 

.03 

61 

57 

54 

51 

50 

.04 

62 

58 

55 

52 

50 

.05 

63 

59 

55 

53 

51 


As the table shows, for the range of a and 0 from .01 to .05 (the 
range most frequently employed), the sequential test results in an aver- 
age saving of at least 47 per cent in the necessary number of observa- 
tions as compared with the current test. The true saving is slightly 
higher than shown in the table, since E%(n) (i = 0, 1) calculated under 
the condition that A = (1 - fi)/a and B = 0/(1 - a) is greater than 
Ei(n ) calculated under the condition that A = A(a, 0) and B — B(a y 0). 
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3.7 Lower Limit of the Probability That the Sequential Test Will 
Terminate with a Number of Trials Less Than or Equal to a 
Given Number 


In Section A.6 an approximate formula 11 for the probability distri- 
bution of the number of observations required by the sequential test 

is derived in the case in which z = log ^ - 1 • is normally distributed. 

f(x, 0 o ) 

It is pointed out that the same distribution function of n can be 
regarded as an approximation to the exact distribution even when z 
is not normally distributed, provided that the absolute value of E(z) 
and the standard deviation of z are sufficiently small as compared with 
log A and log B . Although the distribution of n given in Section A.6 
could be used to determine the probability that n ^ no for any fixed 
integer n 0 , we shall prefer to derive a lower limit for this probability 
by a different method for the following reasons. (1) The computation 
of the lower limit given in this section is very simple, whereas the use 
of the distribution function given in Section A.6 would require labo- 
rious computations, since that distribution function has not yet been 
tabulated. (2) If n 0 is fairly large and if a and 0 are small, as they 
usually are in practice, the lower bound given in this section will be 
fairly near the exact value. 

For any given positive integer let Pi(n ^ n 0 ) denote the probability 
that n ^ n 0 when Hi is true, i.e., when d = 0{ (i = 0, l). 12 We want 
to derive a lower bound for P t (n ^ no). It will be assumed that no is 
sufficiently large so that the sum z x +'••■+ 2 Wo may be regarded as 
normally distributed even when the distribution of z is not normal. 13 

np 

If ^ log A, then we certainly have n S n 0 . Similarly, if 

a — 1 

np 

^ \ a ^ log B, we must have n ^ n 0 . Hence 


( 3 : 74 ) 

and 

( 3 : 75 ) 


no 

P i(Z)“ S logH) ^Pi(ng,no) 

a = 1 
no 

Po(X>. ^ log B) g P 0 (n ^ n 0 ) 


11 See formulas (A:166), (A:183) and (A:194). 

12 In general, for any relation R we use the symbol P%(R) to denote the probability 
that R holds when Hi is true. 

13 According to well-known theorems in the theory of probability, the sum of a 
large number of independent random variables is nearly normally distributed under 
very general conditions. 
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np 

The inequality / \ a ^ log A can be written as 


(3:76) 


np 

y~/q - n 0 Ei(z) 



vV 1(2) 


> log A - w 0 gi(z) 

V«o^iW 


where <ri(z) denotes the standard deviation of z when i/i is true. The 
left-hand member of (3:76) is normally distributed with mean 0 and 
variance unity when H 1 is true. For any value X we shall denote by 
G(X) the probability that a normally distributed random variable with 
mean 0 and variance unity will take a value less than X. Thus, the 
probability that such a random variable takes a value ^ X is given by 
1 — G(\). Hence the probability that (3:76) holds when Hi is true is 
equal to 1 — G[Xi(n 0 )] where 


(3:77) 


Xi(ft 0 ) 


log .A - n 0 Ei(z) 
\Zn 0 cr 1 (z) 


But the probability that (3:76) holds when Hi is true is equal to 
Pi(2z a ^ log A). Thus, 


(3:78) 


n 0 

^ log A) = 1 - G[Xi(n 0 )] 


Because of (3:74), we obtain 

1 - G[ki(n 0 )] ^ Pi(n g no) 


Thus, 1 — G[Xi(n 0 )] is a lower limit of the probability that n ^ n 0 , 
when Hi is true. 

To obtain a lower limit for Pq(u ^ n 0 ), we rewrite the inequality 


no 

^ \ a ^ log B in the form 

np 

y^Zq - noE 0 (z) 


q=l 


(3:79) 


V^O<ro(z) 


log B - n Q E Q (z) _ 

== / — / \ ^ov^o)> say 

V ^ 0 ( 2 ) 


where <r 0 (z) denotes the standard deviation of z when H 0 is true. Since 
the left-hand member of (3:79) is normally distributed with mean 0 
and variance unity when H 0 is true, the probability that (3:79) holds 
when H 0 is true is equal to G[\o(n 0 )]. Hence, 



^ log B) = (?[Xo(no)] 


(3:80) 
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Because of (3:75), we then have 

(3:81) f?[Xo(%)l ^ Po(n g n 0 ) 

Thus Cr[\ 0 (fto)] is a lower bound of the probability that n ^ n 0 when 
H 0 is true. 

When log A = log (1 — $)/a and logB = log /9/ (1 — a), Table 2 
shows the values of the lower bounds of P 0 (ft ^ n 0 ) and Pi (ft ^ fto) 
corresponding to different pairs (a, /3) and different values of n o. In 
these calculations it has been assumed that the distribution under H 0 
is a normal distribution with mean 0 and variance unity, and the dis- 
tribution under Hi is a normal distribution with mean 8 and variance 
unity. For each pair (a, /3) the value of 8 has been determined from 
(3 :67) so that the number of observations needed for the current most 
powerful test of strength ( a , 0) is equal to 1000. 

TABLE 2 

Lower Bound of the Probability that a Sequential Analysis Will 
Terminate within Various Numbers of Trials, when the Most 
Powerful Current Test Requires Exactly 1000 Trials 


Number 

of 

Trials 

a = .01 and (3 = .01 

a = .01 and $ = .05 

a — .05 and /3 = .05 

Alterna- 

tive 

hypothesis 

true 

Null 

hypothesis 

true 

Alterna- 

tive 

hypothesis 

true 

Null 

hypothesis 

true 

Alterna- 

tive 

hypothesis 

true 

Null 

hypothesis 

true 

1000 

.910 

.910 

.799 

.891 

.773 

.773 

1200 

.950 

.950 

.871 

.932 

.837 

.837 

1400 

.972 

.972 

.916 

.957 

.883 

.883 

1600 

.985 

.985 

.946 

.972 

.915 

.915 

1800 

.991 

.991 

.965 

.982 

.938 

.938 

2000 

.995 

.995 

.977 

.989 

.955 

.955 

2200 

.997 

.997 

.985 

.993 

.967 

.967 

2400 

.999 

.999 

.990 

.995 

.976 

.976 

2600 

.999 

.999 

.994 

.997 

.982 

.982 

2800 

1.00 

1.00 

.996 

.998 

.987 

.987 

3000 

1.00 

1.00 

.997 

.999 


.990 


The probabilities given are lower bounds for the true probabilities. They relate 
to a test of the mean of a normally distributed variate, the difference between the 
null and alternative hypothesis being adjusted for each pair of values of a and fi 
so that the number of trials required under the most powerful current test is exactly 
1000. 
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3.8 Truncation of the Sequential Test Procedure 

Although it is shown in Section A.l that the probability is 1 that 
the sequential test procedure will eventually terminate, it is occasion- 
ally desirable to set a definite upper limit, say n 0 , for the number of 
observations. This can be achieved by truncating the sequential proc- 
ess at n = n 0 , i.e., by giving a new rule for the acceptance or rejection 
of H 0 at the n 0 th trial if the sequential process did not lead to a final 
decision for n ^ n 0 . A simple and reasonable rule for truncation at 
the n 0 th trial seems to be the following: If the sequential probability 
ratio test does not lead to a final decision for n ^ no, accept Ho at the 

np nq 

n 0 th trial when log B < ^jz a ^ 0, and reject 7/ 0 when 0 < ^ \ a < 

a = 1 a = 1 

log A. 

By truncating the sequential process at the n 0 th trial we shall, how- 
ever, change the probabilities of errors of the first and second kinds. 
Let a and (3 be the probabilities of errors of the first and second kinds 
if the sequential test is not truncated. The effect of the truncation 
on a and / 3 will, of course, depend on the value of n 0 . The larger n 0 , 
the smaller will be the effect of truncation on a and 0. We shall denote 
the resulting probabilities of errors of the first and second kinds by 
a(n 0 ) and /3(ft 0 ), respectively, if the sequential process is truncated at 
n = tiq. In this section we shall derive upper bounds for a(n 0 ) and 
0(n o ). 

To obtain an upper bound for a(n 0 ) we have to consider the cases 
in which the truncated process leads to the rejection of H 0 , while the 
non-truncated process leads to the acceptance of H 0 . Denote by 
Po(n 0 ) the probability under Ho of obtaining a sample such that the 
truncated process leads to the rejection of H 0) while the non-truncated 
process leads to the acceptance of H 0 < Then, we clearly have 

(3:82) a(n o) ^ a + po(^o) 

The reason that in (3:82) the inequality sign holds instead of the 
equality sign is that there may be samples for which the truncated 
process leads to the acceptance of H 0> while the non-truncated process 
leads to the rejection of H 0 . To obtain an upper bound for a(n 0 ), we 
merely need to derive an upper bound for po(fto)- By definition, 
Po(n 0 ) is the probability under H 0 that for the successive observations 
z lf z 2y •••, etc., the following three conditions are simultaneously 
fulfilled: 

n 

(i) log B < Y* a < log A for » = 1, • • •, no — 1 
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(it) 0 < /z a < log A 

a® 1 

(Hi) When the sequential pro.cess is continued beyond n 0 , it termi- 
nates with the acceptance of H 0 . 


Denote by po(n 0 ) the probability under H 0 that condition (it) will 
be fulfilled, i.e., 

np 

(3:83) po(^o) = ^ 0 ( 0 ^ < logil) 

a — l 


Since the probability that condition (it) is fulfilled cannot be smaller 
than the probability that all three conditions are fulfilled simultane- 
ously, we have 

Po(tto) ^ Po(%)) 

and, therefore, 

(3:84) oi(no) ^ a + po(^o) 


Thus, a + po(^o) is an upper bound for a(n 0 ), which can easily be 
computed, as will be shown later. To obtain an upper bound for 
P(n 0 ) we shall denote by p\ (n 0 ) the probability (under Hi) that the 
successive observations will be such that the truncated process leads 
to the acceptance of H 0 , while the non-truncated process leads to the 
rejection of H 0 . In other words, pi(n 0 ) is the probability under Hi 
that the successive observations will satisfy the following three condi- 
tions simultaneously: 


(i) log B < ^\ a < log A torn = 1 , • • •, n 0 — 1 

a == 1 
np 

(it) log B < 

trf 


(Hi) If the process is continued beyond the n 0 th trial, it terminates 
with the acceptance of H\. 

Clearly 

(3:85) Ufa) ^ p + Pl (n 0 ) 

Since it is difficult to determine the value of Pi(n 0 ), we shall derive 
a simple upper bound for it. Let pi (n 0 ) be the probability under Hi 
that condition (it) is fulfilled, i.e., 

Pi (no) = Pi (log B < 2jz„ g 0) 

a ® 1 


(3:86) 
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Then pi(n 0 ) ^ Pi(n 0 ) and we have 
(3:87) p(no) £ P + PiM 


We shall now show how po(n 0 ) and pi(no) can be computed. We 
shall assume that n 0 is sufficiently large so that z\ + • • • + z no may be 
regarded as a normally distributed variable. When Hi is true ( i = 

0, 1) the expected value of z\ H b z no is equal to noEiiz) and the 

standard deviation of z x H b z no is equal to \/np(Ji(z) where <r t (z) 

denotes the standard deviation of z when Hi is true. To compute 


wp 

Po(tto), we shall write the inequality 0 < < log A in the follow- 


ing form : 


(3:88) 

Let 

(3:89) 


-n 0 E 0 (z) Zx H b Zn 0 ~ n o E o CO < log A - n 0 E 0 (z) 

VVo CO V CO V^oo'o CO 

_ -n 0 E 0 (z) _ log A - n 0 Eo(z) 

Vl VWoCO an ^ vWoCO 


Since the middle term in (3 :88) is normally distributed with zero mean 
and unit variance when H 0 is true, the probability that (3 :88) is ful- 
filled when H 0 is true is equal to G(v 2 ) — G{v{) where G{v) denotes the 
probability that a normally distributed variable with mean 0 and vari- 
ance unity will take a value < v. Thus, 


(3:90) Po(n 0 ) = G(v 2 ) - G(v x ) 

np 

To compute pi(n 0 ), we shall write the inequality log B < 
in the following form: 

( 3 . 91 ) logg - n 0 Ei(z) < Z\ -\ f-j^„ - npEiiz) < -n 0 Ei(z) 


Vn q<ti(z) 


Vn o<n(z) 


V n 0 <Ti(z) 


Let 

(3:92) 


log B — n 0 Ei ( 2 ) -n 0 Ei(z) 

V3 ~ vWi (z) " 4_ VWi (*) 


Since the middle term in (3:91) is normally distributed with mean 0 
and variance unity when Hi is true, the probability (under Hi) that 
(3:91) holds is equal to G ( » 4 ) — G(v 3 ). Hence, 


(3:93) 


?i(n 0 ) = G(v 4 ) — G{v 3 ) 
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Our results can thus be summarized as follows: 

(3:94) a(fto) ^ <* + G(v 2 ) - G(v i) 

and 

(3:95) fi( n o) = P + G(v 4 ) — G(^) 

where iq, v 2) * 3 , and v\ are given in (3:89) and (3:92). These upper 
bounds may considerably exceed a (no) and fi(n 0 ), respectively. It 
would be desirable to find closer limits. 

Table 3 shows the values of the upper bounds of a(n 0 ) and /?(fto) 
given in (3:94) and (3:95) corresponding to different pairs (a y /3) and 
different values of In these calculations we have put log A = 

TABLE 3 

Effect on Risks of Error of Truncating a Sequential Analysis 
at a Predetermined Number of Trials 



a = .01 and 0 = .01 

i 

a — .01 and 0 = .05 

a = .05 and 0 = .05 

Number 

of 

Upper 

Upper 

Upper 

Upper 

Upper 

Upper 

Trials 

bound of 

bound of 

bound of 

bound of 

bound of 

bound of 


effective 

effective 

effective 

effective 

effective 

effective 


l 

a. 

0 

<x 

0 

a 

0 

1000 

.020 

.020 

.033 

.070 

.095 

.095 

1200 

.015 

.015 

.024 

.063 

.082 

.082 

1400 

.013 

.013 

.019 

.058 

.072 

.072 

1600 

.012 

.012 

.016 

.055 

.066 

.066 

1800 

.011 

.011 

.014 

.053 

.062 

.062 

2000 

.010 

.010 

.012 

.052 

.058 

.058 

2200 

.010 

.010 

.012 | 

.051 

.056 

.056 

2400 

.010 

.010 

.011 

.051 

.055 

.055 

2600 

.010 

.010 

.011 

.051 

.053 

.053 

2800 

.010 

.010 

.010 

.050 

.053 

.053 

3000 

.010 

.010 

.010 

.050 

.052 

.052 


If the sequential analysis is based on the values a and 0 shown, but a decision 
is made at no trials even when the normal sequential criteria would require a con- 
tinuation of the process, the realized values a(no) and /3(no) will not exceed the 
tabular entries. The table relates to a test of the mean of a normally distributed 
variate, the difference between the null and alternative hypotheses being adjusted 
for each pair (a, 0) so that the number of trials required by the current test is 1000. 
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log (1 — / Q)/a and log B = log 0/(1 — a), and assumed that the dis- 
tribution under H 0 is normal with mean 0 and variance unity, and the 
distribution under Hi is normal with mean 0 and variance unity. For 
each pair (a, 0) the value of 0 has been determined so that the number 
of observations required by the current most powerful test of strength 
(a, 0) is equal to 1000. 

It seems to the author that the upper limits given in (3:94) and 
(3:95) are considerably above the true a(n 0 ) and 0(n o ), respectively, 
when n 0 is not much higher than the value of n needed for the current 
most powerful test. 

3.9 Increase in the Expected Number of Observations Caused by 
Replacing the Exact Values A(a, p) and B( a, p) by (1 — P)/a 
and p/(l — a), Respectively 

The quantities A (a, 0) and B(a , 0) denote the values of A and B 
for which the probabilities of errors of the first and second kinds asso- 
ciated with the sequential probability ratio test are exactly a and 0, 
respectively. In Section 3.3 it has been recommended that A (a, 0) 
and B(ot, 0) be replaced by a(a, 0) = (1 - 0)/a and 6(a, 0) = 
0/(1 — a), respectively. This may slightly increase the expected num- 
ber of observations, since a(a, 0) ^ A(a, 0) and b(a, 0) S B(a, 0). u 
The present section gives estimates of the amount of such increase in 
the expected number of observations. 

In Section 3.5 the following approximation formula has been ob- 
tained for the expected number of observations: 

„ , v L(0) log B + [1 — L(6)] log A 
(3:9S) *.<»> ^ 

Since L(6 0 ) = 1 — a and L(0 1 ) = /3, we obtain from (3:96) 

(3:97) 
and 
(3:98) 

Ei(n) denotes the expected value of n when 6{ is true. 


E 0 (n) 

EM 


(1 — a) log B + a log A 

EM) 

P log B + (1 - ft) log A 
EM 


14 See inequalities (3:21) and (3:22). 
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Thus, the changes A E 0 (n) and AE x (ri) in the expected values E 0 (n) 
and Ei(n ) caused by using a(a, 0) and b(a, 0) instead of A (a, 0) and 
B(a, 0), respectively, are given by 


(3:99) 


AE 0 (n) 


(1 — a ) [log b(a, 0) - log B(a, 0)] +1 
a[log a(«, 0) - log A (a, ff)] I 


and 

(3:100) 


(1 - a) log 


6 («, 0 ) 

B(a, 0) 


+ <x log 


E 0 (z) 


a(a, 0) 
A (a, 0) 


AEi(n) 



b(a, 0) — log B(a, 0)] + 

- /3) [log a(«, 0) ~ log d(a, ^)] 

_ 


0 log 


h(ffl, 0 ) 

B(a, 0) 


+ (l - 0) log 


«(«, 0 ) 

A (ot, 0) 


E\(z) 


Formulas (3:99) and (3:100) are, of course, approximation formulas, 
since (3:97) and (3:98) are approximations. However, if the error in 
the formulas (3:97) and (3:98), i.e., if the differences 


(3:101a) 

and 

(3:1016) 


E 0 (n) - 


(1 — a) log B + a log A 
Eo(z) 


Ei(n) - 


0 log B + (1 - 0) log A 
E i (z) 


were exactly independent of the quantities A and B, then in (3 :99) and 
(3:100) the equality sign would hold exactly. It can be shown that 
small changes in A and B affect the differences (3:101) exceedingly 
little, and, therefore, (3:99) and (3:100) are very close approximations. 
We shall derive upper bounds for the right-hand members of (3 :99) 

6(a, 0 ) . is • a ( a > 0 ) 

and (3 :100) . Since E 0 (z) and log — — — are negative, while log - 


is positive, we have 


B(a,0) 


A(a,0 ) 


14 It is remarked at the end of Section A.2.1 that E(z) and a certain quantity 6o 
defined there have opposite signs. Since ho = 1 if Ho is true, and ho = —1 if Hi is 
true, it follows that i?o( z ) < 0 and Ei(z) > 0. 
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/i \ 1 a > ® , , a (“> ® 

(1 — a) log h a log 

*B(a,0) A(a,i 8) 

(3:102) — < 

E 0(2) 


(1 — a) log 


6(a, 0) 


E 0 (z) 

b(a, 0) 


< log 

E 0 (z ) B(a,p) 

. . . a(a, 0 ) 6(a, 0 ) 

Similarly, since E x (z) and log are positive, while log 

A{a,0) B{a,0) 

is negative, we have 


a , b ( a > P) , ,, , «(«. 0) 

<3:103) 5^j ~ < 


(1 - 0) log 


a (a, 0) 


A(a, 0 ) 
Ex{») 

a(ot, 0 


< log 

E x (z) *A(a,t 3 ) 

1 a(a, P) 

Thus, for all practical purposes log is an upper bound for 

Ei{z) A(a,p) 

1 b(a , p) 

AEi(n) and log is an upper bound for A Eo(n). The exact 

E 0 (z) B(a, p) 

values A (a, p) and B(a, (3) not being known, we cannot yet use these 

1 a(a, p) 

limits. Since EAz) > 0, an upper limit of log is obtained 

Ei(z) A (a, 0) 

a(a, P) a(a, 13) 

by substituting for an upper bound of . Similarly, 

A(a,p) A(ct,P) 

1 b(a, /3) 

since E 0 (z) < 0, an upper limit of log can be obtained by 

E 0 (z) B(a, (3) 

b(a, /3) b(a, P) 

substituting for a lower bound of 

B(a, 13) B(a, p) 

From equations (A:29) and (A:30) in the Appendix one can derive 
the following inequalities: 

a(a, p) 

(3 :104) 77 ^ k 

A (a, P) 

and 

b(a, P) 

(3:105) — — ^ rje Q 

B(ot, P) 

where the quantities 8$ and rjo are defined by equations (A:27) and 
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(A:28). 1# The quantities h and tie have been explicitly computed foi 
binomial and normal distributions. 

Thus, we arrive at the following result: For all practical purposes 
we may regard (log 5« 0 )/I?i(z) as an upper bound for AEi(ri) and 
(log rie~)/E 0 (z) as an upper bound for AE 0 (ri). 

TABLE 4 

Increase in Expected Number of Observations Resulting from 
Approximations in Criteria for Terminating a Sequential 
Process 


Number of 
Observations Needed 
for the Current 
Most Powerful Test 

o © 

!! II 

q o 

II II 

$ 

a = .05 
p = .05 

10 

1.1 

1.3 

1.6 

30 

1.9 

2.2 

2.7 

100 

3.4 

4.0 

4.9 

200 

4.9 

5.7 

6.9 

500 

7.7 

9.0 

10.9 

1000 

10.8 

12.7 

15.4 


The tabular entries may, for practical purposes, be treated as upper bounds of 
the exact increases. The table relates to a test of the mean of a normally distributed 
variate, the difference between the null and alternative hypotheses being adjusted 
for each pair of values of a and p so that the number of trials required under the 
best current test is as shown in the left-hand column. 


16 This can be seen as follows: Substituting A (a, p) for A, B(a, ft) for B , and 0o 
for 0 , we obtain from (A:29) and (A:30) 

[B(«, (SlfWwo g E e „* 

and 

***• ^ lA(a, 0)]A ( *°%o 

Since we let A = A (a, p) and B = #(<*, p), we have L(0o) = 1 — a and L{ 6 \) — p. 
It follows from this and the two equations which are obtained from (A:18) by 
substituting 0o and 0i for 6 that 

= b(a, 0 ) and Eft,** = = a(a, 0 ) 

1 — a a 

Since ^(0o) = 1, we obtain 

B(a, P)tj 0 o ^ b(a, ft) and a(a , ft) £ A (a, P) 8 e 0 

from which (3:104) and (3:105) follow. 
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As an example, consider the case in which the distribution under 
Ho is normal with zero mean and unit variance, and the distribution 
under Hi is normal with mean 9 and variance unity. Since for the 
normal distribution rje = 1 /So [see equation (A:51)] and —Eq(z) = 
E\(z), the upper bound of A E 0 (n) is the same as the upper bound of 
A2?i(n). This upper bound depends only on the value of 9 . For any 
pair (a, (S) and for any positive integer m there exists exactly one value 
of 9 such that m observations are needed for the current most powerful 
test of strength (a, 0). Thus, with each integer m and pair (a, 0) 
there is associated exactly one value of 9. Table 4 shows the common 
upper bound of A Eo(n) and AE\(n) calculated for values of 9 corre- 
sponding to different pair* (a, 0) and integers m. 



Chapter 4. OUTLINE OF A THEORY OF SEQUENTIAL TESTS 
OF SIMPLE AND COMPOSITE HYPOTHESES AGAINST A SET 
OF ALTERNATIVES 

In Chapter 3 we were concerned mainly with the theoretical case of 
testing a simple hypothesis H 0 against a single alternative Hi. In 
problems arising in applications, the unknown parameter, or param- 
eters, can usually take infinitely many values. In this chapter we 
shall discuss sequential tests of simple and composite hypotheses 
against infinitely many alternatives. 

4.1 Tests of Simple Hypotheses 

4.1.1 Introductory Remarks 

A simple hypothesis has been defined as a statement which specifies 
completely the values of all the unknown parameters. We should like 
to make some remarks concerning the conditions under which a test 
of a simple hypothesis is meaningful and appropriate. For this pur- 
pose it will be sufficient to consider the case in which there is only one 
unknown parameter 0 involved in the distribution of the random vari- 
able x under consideration. A simple hypothesis is then a statement 
that 0 is equal to some specified value 0o. 

In applications the problem of testing a hypothesis usually arises 
as follows: There are two alternative courses of action, say action 1 
and action 2, between which a decision is to be made, and the prefer- 
ence for one or the other action depends on the value of the parameter 
0. Let co denote the set of all values of 0 for which action 1 is preferred 
to action 2; then action 2 is preferred to action 1 for all values 0 ou1>- 
side co. 1 Let H u be the hypothesis that 0 is contained in co. Then the 
problem of deciding between the two courses of action can be formu- 
lated as the problem of testing the hypothesis H^. If H a is accepted 
we take action 1 and if Ho, is rejected we take action 2. If the degree 
of preference for one or the other action varies continuously with the 
value of 0, the set co cannot consist of a single value 0o. In fact, if co 
were to contain only the single value 0 O , it would mean that we prefer 
action 1 when 0 = 0 O and we prefer action 2 for any 0 ^ 0o, no matter 

1 For values 0 on the boundary of w it will usually be inconsequential which 
action is taken. 
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how near 0 is to 0o- Thus, we would have a discontinuity in our prefer- 
ence scale at 0 = d 0 . 

We see that the problem of testing a simple hypothesis arises, strictly 
speaking, only if there is a discontinuity in our preference scale for 
actions 1 and 2. While a discontinuity in the preference scale is, of 
course, possible, it will occur rather seldom. A discontinuity in the 
preference scale may occur, for example, if we want to test the validity 
of some hypothetical scientific theory which implies that the param- 
eter 0 must have a specified value 6 0 . In such a case any deviation of 
the value of 0 from 0 O , no matter how small, is of importance, since it 
invalidates the hypothetical theory in question. 

Whenever the degree of preference for one or the other action varies 
continuously with the value of 0, the hypothesis to be tested will have 
to be, strictly speaking, a composite one. Nevertheless, frequently it 
will be expedient to approximate the composite hypothesis by a simple 
one, since the latter is usually a simpler problem to treat. As an illus- 
tration, consider the following example: Suppose that the hardness x 
of a material varies from unit to unit and is normally distributed with 
a known variance. The mean value 0 of x is, however, unknown. Sup- 
pose that 0 O is considered to be the most desirable value of 0 and the 
material is considered less desirable the greater | 0 — 0o | - Let action 
1 be acceptance of the material and action 2, rejection of the material. 
Preference for acceptance is strongest when 0 = 0 O . The preference 
for acceptance will decrease steadily as | 0 — 6 0 | increases. There will 
be a positive value 8 such that for | 0 — 0o | >5 rejection of the mate- 
rial is preferred and the degree of preference for rejection increases 
with increasing value of | 0 — 0o | in the domain | 0 — 0 O | >5. If 
| $ — 0 O | = 5, i.e., if the quality of the product is just on the margin, 
neither action is preferable to the other. In such a situation the proper 
hypothesis to be tested is the composite hypothesis that | 0 — 0o | ^5. 
However, if 8 is small, the composite hypothesis may be replaced for 
practical purposes by the simple hypothesis that 0 = 0o- The test of 
the hypothesis that 0 = 0 O will have nearly the same operating charac- 
teristic function as the test of the hypothesis that | 0 — 0o | ^ 5, for 
the following reasons. To test the hypothesis that | 0 — 0o | ^ 5 we 
subdivide the 0-axis into three zones : zone of preference for acceptance, 
zone of preference for rejection, and zone of indifference. As explained 
in Section 2.3.1, the zone of preference for acceptance consists of all 
values 0 for which acceptance is strongly preferred, i.e., for which the 
rejection of the material is considered an error of practical importance. 
Similarly, the zone of preference for rejection consists of all those values 
0 for which rejection is strongly preferred, whereas for values 0 in the 
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indifferent zone the preference for one action over the other is only 
slight and we do not care particularly which action is taken. In our 
example the three zones may reasonably be defined as follows. We 
select two positive values 8 0 < 8 and 5i > 8. The zone of preference 
for acceptance is given by | 9 — 0 O | ^ 8 0 , the zone of preference for 
rejection by | 9 — 0 O | ^ 5i, and the zone of indifference by 8 0 < 

| 9 — 0 O | < <$i. The test procedure will then be constructed so that 
the probability of rejection will not exceed a preassigned value a when- 
ever 9 is in the zone of preference for acceptance, and the probability 
of acceptance will not exceed a preassigned value ft whenever 9 is in 
the zone of preference for rejection. 2 Now if we replace the original 
composite hypothesis by the simple hypothesis that 9 = 9 0 , the zone 
of preference for acceptance will consist of the single value d = 9 0 . 
The zone of preference for rejection may be defined, as before, by 
| 0 — 0 O | ^ $!. The zone of indifference is then given by 0 < | 6 — 9 0 \ 
< $i. The test procedure for testing that 9 = 9 0 will then satisfy the 
requirement that the probability of rejecting the hypothesis is <? when 
9 = 0 O and the probability of accepting the hypothesis does not exceed 
P whenever | 9 — 0 O | ^ 8 X . If 5 0 is very small, the test of the hypoth- 
esis that 9 = 0 O will satisfy the requirements imposed on the test of 
the original composite hypothesis with close approximation, since the 
probability of rejecting the hypothesis will be nearly equal to a for 
values 9 in a sufficiently small neighborhood of 9 0 . Thus, for practical 
purposes we may replace the original composite hypothesis by the 
simple hypothesis that 9 = 0 O . 

As we have seen, a test of a simple hypothesis will occur in applica- 
tions in two cases: (1) when there is a discontinuity in the preference 
scale and the problem calls for testing a simple hypothesis in the strict 
sense (these cases are rare) ; (2) when the problem is such that it calls 
for testing a composite hypothesis and it is approximated by a simple 
hypothesis merely for the sake of simplicity. 

In terms of the zones of preference for acceptance, of preference for 
rejection, and of indifference, the simple hypothesis may be character- 
ized by the condition that the zone of preference for acceptance con- 
sists of a single point. 

4.1.2 Test of a Simple Hypothesis against One-Sided Alternatives 

We shall discuss here the simple case in which there is only one un- 
known parameter 9 and the hypothesis that 9 = 9 0 is tested against 
alternative values of 9 which lie on one side of 9 0) say > 0 O . In other 
words, only values of 9 > 0q are considered admissible alternatives to 

1 In this connection see Section 2.3.2. 
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the hypothesis to be tested. In this case the zone of preference for 
acceptance consists of the single value do ■ The degree of preference 
for rejection of the hypothesis will generally increase with increasing 
value of 0 in the domain 0 > 0 O . It will, therefore, be possible to find 
a value 0i > d 0 such that the acceptance of the hypothesis is con- 
sidered an error of practical importance whenever 6 ^ 0j, while for 
values 6 > do but < 0i the acceptance of the hypothesis is an error of 
no particular practical consequence. Thus, the zone of preference for 
rejection may be defined by 0 ^ 0i, and the zone of indifference by 
do < d < 

According to Section 2.3.2 we shall impose the following require- 
ments on the OC function of the test. The probability that the hy- 
pothesis will be rejected should be equal to a preassigned value a when 
d = d 0 . The probability of accepting the hypothesis should not exceed 
a preassigned value /3 whenever d ^ 6 1 . 

In most of the important cases occurring in practice, such as when 
x has a normal, binomial, or Poisson distribution, and so on, the se- 
quential probability ratio test of strength (a, ff) for testing the hy- 
pothesis that 6 = d 0 against the single alternative 9i will satisfy the 
imposed requirements, since the probability of an error of the second 
kind will decrease steadily with increasing values of d in the domain 
d ^ 0i. Thus, in all these cases the sequential probability ratio test 
for testing the hypothesis that 0 = 0 O against a properly chosen alter- 
native 0j provides a satisfactory solution to our problem. 

The case in which the alternative values of 0 are restricted to values 
0 < 0 O instead of values > 0o is entirely analogous and need not be 
discussed separately. 

4.1.3 Test of a Simple Hypothesis with No Restrictions on the 
Alternative Values of the Unknown Parameters 

In this section we shall deal with the following general problem: The 
distribution of x involves k unknown parameters 0i, • • •, 0fc and the 
hypothesis Ho to be tested is that 0i , • • • , 0* are equal to some 
specified values 0i°, • • • , d k °, respectively. The set of k parameters 
(0i, •••,0*) will be denoted by 0 without any subscript and will 
be referred to as a parameter point. The use of a superscript to the 
letter 0, such as 0° or 0 1 , etc., will indicate that a particular parameter 
point is meant. Our hypothesis H 0 can thus be expressed by stating 
that the unknown parameter point 0 is equal to the particular param- 
eter point 0°. 

As we have seen in the preceding section, the zone of preference for 
acceptance consists of the single parameter point 0° . Denote the zone 
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of preference for rejection by oo r . This will usually be the set of all 
points 0 whose “distance” (defined in some sense) from 0° is greater 
than or equal to some given positive value. The requirements imposed 
on the OC function of the test, as formulated in Section 2.3.2, can then 
be stated as follows: The probability that H 0 will be rejected when 
0 = 0° should be equal to a preassigned value a and the probability 
that H 0 will be accepted should not exceed a preassigned value fi for 
any parameter point 0 in the zone w r . 

Before we discuss the problem of constructing a proper sequential 
test satisfying the above requirements, we shall consider the problem 
of finding a proper test procedure satisfying the following modified 
requirements. For any 0 in o r let 13(6) denote the probability that IIo 
will be accepted when 0 is the true parameter point. Thus 13(6) is the 
probability of an error of the second kind when 0 is true. Our original 
requirement was that 13(6) should not exceed a preassigned value fi for 
all 0 in w r . Instead we shall now require that the weighted average of 
1 8 ( 0 ), weighted with a given weight function w(6), should be equal to 
fit i*o*j 

(4:1) f mw(6) <16 = (3 


where w(6) ^ 0 for all 0 in co r and 3 



The requirement that the probability of rejecting Ho when Hq is true 
be equal to a preassigned a is maintained as before. A proper sequen- 
tial test procedure satisfying these modified requirements can easily 
be constructed. Let po n be the probability distribution of the sample 
(xi 9 •••yXn) when H 0 is true, i.e., 

(4:3) Pon = f(%h 01°7 ’ * *, h°)f(x 2 , 01°, • • 0/c°) * • * f( x n> #1° 7 * * ‘7 Qk°) 

Furthermore, let pm be defined by 


(4:4) p ln = f f(x u 6 U • • •, Ok) • • • /(*», , • * •, h)w(d) dd 

Jat r 

Thus, Pin is a weighted average of the probability distribution func- 
tions f(x u 0i, •••,0fc) •••f(z n ,0 u '"yOk) corresponding to various 
parameter points 0 in co r . As such, p\ n itself is a probability distribu- 

3 The weight function w{9) may also be discrete. A single formula valid for both, 
continuous and discrete, weight functions could be given by using Stieltje’s integrals 
in (4:1) and (4:2). 
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tion function of the sample (x lf • • •, x n ). 4 Let H i denote the hypoth- 
esis that the distribution of the sample (x Xj • • *,x n ) is given by p Xn 
defined in (4:4). Then H x is a simple hypothesis, since it specifies 
completely the distribution. Consider the sequential probability ratio 
test of strength (a, 0) for testing H 0 against the simple alternative 
hypothesis H x . This procedure is given as follows. Reject Hq if 


(4:5) 

accept Hq if 
(4:0) 



A 


Pin 

POn 


£ B 


and take an additional observation if 


Pi n 

(4:7) B < — < A 

POn 

The expressions p 0n and p Xn are given by (4:3) and (4:4), respectively, 
and the constants A and B are to be chosen so that the test will have 
the required strength (a, 0). As we have seen in Section 3.3, for most 
practical purposes we may use the approximation formulas A = 
(1 — 0)/a and B = 0/(1 — a). b 

The sequential probability ratio test defined by (4:5), (4:6), and 
(4:7) can be shown to satisfy the relation (4:1). Thus, this probability 
ratio test may be regarded as a satisfactory solution to our problem if 
our requirement is that the probability of an error of the first kind 
should be a and that 0(0) should satisfy (4:1). 

In practical problems, however, it seems more reasonable to main- 
tain the original requirements. That is to say, we shall want a test 
procedure such that the probability 0(0) of accepting H 0 does not 
exceed 0 for all parameter points 0 in the zone w r , and the probability 
is a that we shall reject H 0 when 6 = 9°. There are, in general, infi- 
nitely many sequential tests which satisfy these requirements, and we 
want to select one for which the expected number of observations is 
as small as possible. 

4 The distribution of the sample (x h • x n ) will be precisely given by p Xn if 
we assume that 9 in co r has a probability distribution given by the density function 
w(9\ 

8 Although the successive observations x\, x% • • • , etc., are not independent 
when Hi is true (pi n cannot be represented as a product of n factors where the ath 
factor^ depends only on x a ), the results and conclusion in Sections 3.2 and 3.3 
remamvwJid/' as pointed out in Section 3.2. 
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Although a thorough investigation of this problem has not yet been 
made, the following approach may perhaps be reasonable. First we 
restrict ourselves to the class C of sequential probability ratio tests 
based on the ratio pin/pon, where pon is given by (4:3) and p\ n by 
(4:4), corresponding to an arbitrary non-negative weight function w(B) 
satisfying (4:2). 6 Thus, the class C contains at least as many tests as 
there are possible weight functions w{6) satisfying (4:2). A test in 
class C is uniquely determined by choosing a particular weight func- 
tion w(0) and particular values for A and B. The test procedure is 
then carried out in the usual way. Ho is accepted if pin/pon ^ B, 
H 0 is rejected if pin/pon ^ A, and an additional observation is made 
if B < pin/pon < A. The restriction to the class C of sequential tests 
is suggested by the fact that we have been led to these tests by the 
requirement that some weighted average of the probabilities of errors 
of the second kind be equal to a given value 0. 

Accepting the restriction that the sequential test should be a mem- 
ber of the class C, we still need a principle for choosing the weight 
function w(6). Suppose that the quantities A and B have already 
been determined. Let us then examine what would be a reasonable 
choice of wifi). After A and B have been chosen, the probability a of 
making an error of the first kind is also determined for practical pur- 
poses and the choice of w(0) will not affect it. 7 Thus, the choice of 
w(6) will affect only 0(0). A weight function w{6) may be regarded 
the more favorable the smaller the maximum value of 0(0) with respect 
to 0 (0 is, of course, restricted to points in a> r ). Thus, the following 
choice of w(6) seems reasonable: For given values of A and B the weight 
function w(6) is chosen for which the maximum of 0(0) with respect to 6 
(0 restricted to points in w r ) takes its smallest value. When this principle 
for the choice of w(9) is adopted, a and the maximum of 0(0) with 
respect to 0 (0 in <o r ) will depend only on the quantities A and B. 

6 Instead of defining pin by some weighted average of the type given in (4:4), it 
would seem equally reasonable to define p\ n as the maximum of f(x i, 0) • • */(a; n , 0) 
with respect to 0 where 6 is restricted to points in w r . Then the ratio p\ n / po« would 
coincide with the so-called likelihood ratio introduced by J. Neyman and E. Pearson 
and widely used in current test procedures. Our reason for preferring weighted 
averages is that the theory of such tests seems to be considerably simpler. If 
pin were defined by the maximum with respect to 0 in w r , pm would no longer be a 
probability distribution. 

7 In fact, with good approximation the following relations hold: (1 — /£)/« — A 

and 3/(1 — a) — B where 5 = I ${6)w{d) dd. Solving these equations with 

Ju r 

respect to a and 3 we obtain a = (1 — B)/ (A — B) and 3 = [B(A — 1 )]/ ( A — B). 
Thus, ot and 3 depend only on A and B. 
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These values A and B are then determined so that the probability of 
an error of the first kind has the desired value a and the maximum 
of 0(6) with respect to 6 is equal to the required value 0. 

There is no general method yet available for the determination of 
an optimum we ight function w(6) in the s ense d efined a bove . For 
some special but important cases, however, such a weight function has 
been determined. This point is discussed in Section A.8. 


4.1.4 Application of the General Procedure to Testing the Mean 
of a Normal Distribution with Known Variance 


In this section we shall consider the problem of testing the simple 
hypothesis H 0 that the mean 6 of a normal distribution with known 
variance is equal to a particular value 6 0 . The acceptance of Ho will 
not be considered a serious error if 6 6 0 but is near 6 0 . However, 

there will be, in general, a positive value 5 such that the acceptance 
of H 0 is considered an error of practical importance if (and only if) 

^ 8, where a denotes the known standard deviation of the 

distribution. Thus, the region of preference for rejection may be de- 


6 ~0o 

<7 


fined as the set of all values 6 for which 


6 - flp 
a 


^ 8. The region of 


preference for acceptance will consist of the single value do, and the 
region of indifference will be the set of all values 6 for which 0 < 


6 - 6 , 


< 8 . 


The probability density of the sample (a?i, ••*,£ n ) under Ho is 
given by 


(4:8) 


1 -ah 2 «*-••>* 


POn = 


(2x)V* 


According to the general theory discussed in the preceding section, 
Pin is defined as some weighted average of the probability density cor- 
responding to various values of 6 in the zone of preference for rejec- 
tion. It is shown in Section A.8.2 that an optimum weighted average 
is the simple average of the two density functions: the density func- 
tion corresponding to 6 = and the density function correspond- 

ing to 6 = 6 0 + fo . Thus, 

(4:9) - i [-V * a -" ,+ “‘+ - V •- * 

L (2tt)V> (2jt)V* 
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The test is then carried out as follows. We continue taking obser- 
vations as long as B < pin/pon < A. If p\ n /pon ^ A, we reject H 0 . 
If Pin/ Pon ^ B, we accept H 0 . To make the probability of an error 
of the first kind equal to a and the maximum of (5(0) (in the domain 


0-0 o 


a 



to 0, for all practical purposes we may put 


A — (l — 0)/a and B — 0/(1 — a). 

A more detailed discussion of this test procedure is given in Part II, 
Chapter 9. 


4.2 Tests of Composite Hypotheses 

4.2.1 Discussion of an Important Special Case 

A frequent and important problem is that of testing the hypothesis 
H that the unknown parameter 0 does not exceed a specified value 0'. 8 
This problem is of particular importance in quality control of manu- 
factured products. The importance of an error of the first kind (re- 
jection of H when H is true), or that of an error of the second kind 
(acceptance of H when H is false), will usually vary with the value of 
0. For example, if 0 is only slightly below 0' the rejection of H will 
not be considered a serious error. Similarly, if 0 is only slightly above 
0' the acceptance of H will not be considered a serious error. In gen- 
eral, the importance of an error of the first kind will increase steadily 
with decreasing value of 0 in the domain 0 ^ 0', and the importance 
of an error of the second kind will increase steadily with increasing 
value of 9 in the domain 9 > 9' . Thus, it will be possible to find two 
values 0 O < & and 9\ > 9 f such that an error of the first kind is con- 
sidered of practical importance whenever 0 ^ 0 O , and an error of the 
second kind is considered of practical importance whenever 0 ^ 0i, 
whereas for values 0 between 0 O and 0! we do not care particularly 
which decision is made. Hence the zone of preference for acceptance 
may be defined as consisting of all values 0 ^ 0 O , the zone of preference 
for rejection as the set of values 0 for which 0 ^ 0i, and the zone of 
indifference as the set of all values 0 for which 9 0 < 9 < 0i. In such 
a situation we shall want a test procedure for which the probability 
of an error of the first kind is less than or equal to a preassigned a 
whenever 0 ^ 0 O , and the probability of an error of the second kind is 
less than or equal to a preassigned 0 whenever 0 ^ 0 X . In most of the 
important cases occurring in practice, such as when x has a normal, 
binomial, or Poisson distribution, and so on, the sequential probability 

8 It is assumed here that there is only one unknown parameter 6 involved in the 
distribution of x. 
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ratio test of strength (a, /3) for testing the hypothesis that 0 = 0 O 
against the single alternative that 0 = 0i will have the desired prop- 
erties and provides a satisfactory solution to the problem. If the 
sequential probability ratio test leads to the acceptance of the hypoth- 
esis that 0 ~ 0 O , we accept the original hypothesis that 0 ^ 0', and if 
the probability ratio test leads to the rejection of the hypothesis that 
0 = 0 o , we reject the original hypothesis that 0 ^ 0'. 

As an illustration, we shall discuss briefly one or two examples. 
Suppose that a lot consisting of a large number of units of a manu- 
factured product is submitted for acceptance inspection. We shall 
assume that each unit is classified in one of the two categories: de- 
fective and non-defective. The proportion p of defectives in the lot 
is assumed to be unknown. The preference for acceptance or rejec- 
tion of the lot will, of course, depend on the value of p. It will be 
possible, in general, to select two values of p, say p 0 and p\ (po < pi) 
such that the rejection of the lot is considered an error of practical 
importance whenever p ^ p 0 , and the acceptance of the lot is an error 
of practical importance whenever p ^ p\ ; for values p between po and 
pi we do not care particularly which decision is made. Thus, the zone 
of preference for acceptance is given by p ^ p 0 , the zone of preference 
for rejection by p ^ pi, and the zone of indifference consists of values 
p for which Po < P < Pi. Hence, we shall want a test procedure for 
which the probability of rejecting the lot is less than or equal to a 
preassigned value a whenever p ^ p 0 , and the probability of accept- 
ing the lot is less than or equal to a preassigned value /3 whenever 
p ^ pi. Such a test procedure is given by the sequential probability 
ratio test of strength (a, /3) for testing the hypothesis that p = po 
against the single alternative that p = pi. To compute the proba- 
bility ratio Pin/ Pon for this problem, we shall denote by d n the number 
of defectives found in the first n units inspected. The probability of 
obtaining a sample equal to the observed one is given by 

(4:10) pi„ = Pi d "(l - pi)— 4, 

when p = pi, and by 

(4:11) p 0n = po*(l - Po) n ^ n 

when p = po- 9 Then 

(4:12) log — = d n log — + (n - d„) log — 

Pon Po 1 - Po 

•Formulas (4:10) and (4:11) are strictly valid only if the lot contains infinitely 
many units. It is assumed that the lot contains a large number of units so that 
these formulas can be used with good approximation. 
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The test procedure is carried out as follows. We continue inspec- 
tion as long as log 5 < log (pm/pon) < log A. If log (pm/pon) ^ 
log -A, inspection is terminated with the rejection of the lot, and if 
log (pin/pon) ^ log B , inspection is terminated with the acceptance of 
the lot. For practical purposes we may put A = (1 — $)/a and B = 
0/(1 - «)• 

A detailed discussion of the problem of acceptance inspection when 
each unit is classified either as defective or as non-defective is given 
in Part II in Chapter 5. 

Another example for testing a hypothesis that 0 S 9' is the case 
when 6 is the unknown mean of a normal distribution with known 
variance. 10 Again it will be possible to select two values 0 O < O' and 

> 9' such that an error of the first kind is considered of practical 
importance whenever 9 g 9 0 , an error of the second kind is of prac- 
tical importance whenever 9^9%; for values 9 between 9 0 and 0i we 
do not care particularly which decision is made. In such a situation 
we shall want a test procedure for which the probability of committing 
an error of the first kind is less than or equal to some preassigned value 
a whenever 0 ^ 0 O , and the probability of committing an error of the 
second kind does not exceed a preassigned value P whenever 0 ^ 0i. 
These conditions will be satisfied by the sequential probability ratio 
test of strength (a, P ) for testing the hypothesis that 0 = 0 O against 
the single alternative hypothesis that 0 = 0i. The probability density 
of the sample (x\ 9 • • *, x n ) is given by 

1 “* 2^2 
n ” 

(2ir)V* 

(2ir)V* 

when 0 = 0i. We continue taking observations as long as B < 
Pin/ Pon < A . If Pm/Pon ^ A, we reject the hypothesis that 0 ^ 0', 
and if pin/pon ^ B we accept the hypothesis that 0 ^ 0'. Again, we 
put A = (1 — P)/at and B = P/( 1 — a). 

4.2.2 Outline of the Test Procedure in the General Case 

In testing a composite hypothesis H u that the parameter point 0 lies 
in a subset <o of the parameter space, the parameter space is again 
subdivided into three mutually exclusive zones: the zone of preference 

10 This problem is discussed in detail in Part II, Chapter 7. 


(4:13) Pon ~ 

when 0 = and by 
(4:14) Pi„ = 
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for acceptance o> 0 , the zone of preference for rejection w r , and the zone 
of indifference. The zone of preference for acceptance will now con- 
sist of more than one parameter point, as distinguished from the case 
of testing a simple hypothesis. 

For any test procedure the probability of an error of the first kind 
(rejecting // w when is true) will, in general, vary with the param- 
eter point in co. For any parameter point 6 in o) we shall denote by 
a(6) the probability that will be rejected when d is true. Simi- 
larly, the probability of an error of the second kind (accepting H w when 
it is false) is a function p(6) defined for all points 0 outside co. 

According to the requirements formulated in Section 2.3.2, we shall 
want a test procedure such that a(d) will not exceed a preassigned 
value a for all 0 in the zone co a , and f3(d) will not exceed a preassigned 
value 13 for all 0 in the zone co r . Before discussing the problem of 
finding a proper test procedure satisfying these requirements, we shall 
again consider, as in the case of the simple hypothesis, the following 
modified problem: Let w a (d) and w r (d) be two non-negative functions 
of 0 , called weight functions, such that 11 

(4:15) f w a (d) dd = 1 and I w r (0) dd = 1 

Ju a J V>T 

Suppose that we wish to construct a sequential test such that the 
weighted average J ct(d)w a (d) dd of the probabilities of errors of the 
first kind is equal to a given value a , and the weighted average 

f p(0)w r (0) dd of the probabilities of errors of the second kind is a 
J 0> r 

given value jS. 

A proper sequential test satisfying these modified requirements can 
be constructed as follows. Let po n and p\ n be defined by 


(4:16) 

and 

P..-J 

«o 

' • ■, dk) ■ 

• • f(Xn, du ' 

• d k )w a (d) dd 

(4:17) 

0..-J 

f/(* lf01> ’ 

w 

•■,e k ) • 

• • f(x„, d u • 

• • , dk)w r (6) dd 


where f(x, d h • • •, d k ) denotes the probability distribution of x when 
d is true. The functions p 0n and p ln can be interpreted as probability 
distributions of the sample (x\ 9 • • • , x n ). Denote by Ho* the hypoth- 

ii The weight functions Wa(0) and w r (0) may also be discrete. Formulas valid 
for both continuous and discrete weight functions could be given by using Stieltje’s 
integrals in (4:15) and subsequent equations. 
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esis that the distribution of the sample fa, • • •, x n ) is given by (4:16), 
and by Hi* the hypothesis that the distribution of (z\, • • •, x n ) is given 
by (4:17). The sequential probability ratio test of strength (a, 0) for 
testing H 0 * against Hi * provides a solution to our problem. If the 
constants A and B in this sequential test are chosen so that the prob- 
ability is a that we reject Ho* when H 0 * is true, and the probability 
is ( 3 that we accept Ho* when Hi* is true, then for this sequential test 


we have 

1 w a (0)a(6) dO — ol 


Ju>a 

and 

f w r (6)p(6) de = (3 

4 Jeer 


To make the strength of the test of H 0 * against Hi* equal to (a, 0), 
again, for practical purposes, we may put A = (1 — 0)/a and B = 

0/(1 - a). 

To construct a sequential test procedure satisfying the requirements 
(4:18) a(6) ^ a for all 0 in w 0 

and 

(4:19) 0(6) ^ 0 for all 6 in o> r 

we shall restrict ourselves to sequential probability ratio tests for which 
p 0n and pi n are given by (4:16) and (4:17), respectively, and w a (6) 
and w r (0) may be any weight functions satisfying (4:15). Denote by 
C the class of all such tests corresponding to all possible weight func- 
tions w a (d) and w r (6). To select a proper test from the class C which 
satisfies the requirements (4:18) and (4:19), our procedure will be sim- 
ilar to that in the case of simple hypotheses, as discussed in Section 
4.1.3. A test in class C is uniquely determined by the choice of the 
constants A and B and by the weight functions w a (6) and w r (6). Thus, 
the maximum of a(d) with respect to 6 in the zone co a , as well as the 
maximum of 0(0) with respect to 0 in the zone « r , is determined uniquely 
by A, B , w a (0)y and w r (0). Denote these maxima by a[A, B , w a , w r ] 
and 0[A, By w a , w r ] } respectively. For given values A and B , the 
weight functions w a (0) and w r (0) may be regarded the more desirable 
the smaller they make a[A, B , w a , w r ] and 0[A , B } w a , w r \. Thus, if it 
is possible to find weight functions w a (0) and w r (0) for which both 
a[A , B y Way w r ] and 0[A, By w a) w r ] are simultaneously minimized, they 
may be regarded as optimum weight functions. It is shown in Section 
A.9 that in some important special cases, such as testing the mean of 
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a normal distribution with unknown variance, optimum weight func- 
tions of the type described above do exist. However, it is not known 
whether they generally exist. If it is not possible to minimize both 
a[A y B , w a , w r ] and 0[A, B } w aj w r ] simultaneously, it may be reason- 
able to choose w a {B) and w r (d) such that some average of the two 
values ot[A, B , w a > w r ] and 0[A, B , w a , w r ], or the maximum of these 
two values, is minimized. 

If the principle described above for choosing the weight functions 
w a {6) and w r {6) is adopted, the maximum of a(6) in the zone and 
the maximum of 0(0) in the zone u r will depend only on A and B . 
Finally the constants A and B are determined so that these two max- 
ima are equal to a and 0, respectively. 

There is no general method yet available for constructing weight 
functions w a {0) and w r (6) which are optimum in the sense defined 
above. In some special cases, however, such weight functions have 
been constructed. 12 

4.2.3 Application of the General Procedure to Testing the Mean 
of a Normal Distribution with Unknown Variance (Sequen- 
tial f-Test) 

A frequent and important problem in applications is that of testing 
the hypothesis H that the unknown mean 0 of a normal distribution 
is equal to some specified value 9 0 when nothing is known about the 
variance <x 2 of the distribution. If the true value 0 differs only slightly 
from 0 O , i.e., if | 0 — 0 O | is only a small fraction of the standard devi- 
ation a, the acceptance of H will usually not be considered an error of 
practical consequence. However, the importance of an error committed 
by accepting H when 0 ^ 0 O will, in general, increase with increasing 

Q 0 

value of . Thus, it will be possible to find a positive value 

a 

8 such that the acceptance of H is considered an error of practical 
importance only when ^ 5. Accordingly, the three zones in 

< X 

the parameter space will be defined as follows. The zone u a of prefer- 
ence for acceptance consists of all points (0, <x) for which 0 = 0o, i.e., 
o) a consists of all points (0 O , <r) where a can take any positive value. 
The zone co r of preference for rejection consists of all points (0, <r) for 
0 0 

which ^ 8 . Finally the zone of indifference contains all 

a 

0 - 0 O 

points (0, cr) for which 0 < < 8. 


12 See Section A. 9. 
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The probability density of a sample (x\, • • - ,x n ) drawn from a nor- 
mal distribution with mean 0 and standard deviation a is given by 


(4:20) 


1 


Vn = 


- 27 , 2 

e a== 1 


(2tt) 2 a n 


As in the general procedure described in the preceding section, the test 
procedure will be based on the ratio p Xn /pon where p Qn is some weighted 
average value of p n corresponding to various points ( 6 , a) in co a , and 
Pin is some weighted average of p n corresponding to various points 
(0, cr) in co r . It is shown in Section A.9 that by choosing the weight 
functions w a (0) and w r (0) according to the principles described in the 
preceding section we are led to the following ratio: 13 


71 7* 

1 r 1 - ~ 2 (x a -6o-6<r) 2 - 2 7 2 S(%-^) 


(4:21) 


Pin 

VOn 


/»a0 

I Jo 




+ e 


] da 


r ± 

Jo X 


•0o) 


da 


The test procedure is then carried out as follows. Additional observa- 
tions are taken as long as B < P\ n /Von < A. The hypothesis H is 
rejected if pm/pon ^ A and the hypothesis Ii is accepted if pxjpon 
^ B. To satisfy the requirements (4:18) and (4:19) for practical pur- 
poses we may let A = (1 — /3)/a and B = 0/(1 — <*)• 

4.2.4 A Particular Class of Problems Treated by Girshick 14 

A class of problems treated by M. A. Girshick may be formulated 
as follows. Let X\ and x 2 be two independent random variables. The 
distribution (elementary probability law) of X\ is given by }(x i, 6\) 
and that of x 2 by f(x 2 , 0 2 ), where the function /is known but the values 
of the parameters 6 X and 0 2 are unknown. The problem is to test the 
hypothesis H that B x ^ 0 2 against the alternative hypothesis H r that 
01 > 02* 

The type of problem described above occurs frequently in applica r 
tions. For example, let x denote some quality characteristic, such as 
hardness, tensile strength, or yeight, of a manufactured product. Sup- 

18 Considerable work on the evaluation of this ratio to bring it to a suitable 
form for tabulation was done by K. Arnold while he was a member of the Statistical 
Research Group of Columbia University. Tables for the computation of this ratio 
have been prepared by the Mathematical Tables Project, New York. 

14 M. A. Girshick, “Contributions to the Theory of Sequential Analysis,” The 
Annals of Mathematical Statistics , Vol. 17 (1946). 
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pose that the distribution of x in the population of units produced has 
a known functional form f(x, 0), but the value of the parameter 0 is 
unknown. Suppose, furthermore, that there are two competing proc- 
esses of production under consideration by the manufacturer. Let 0\ 
denote the value of 0 when process 1 is used, and 0 2 when process 2 
is used. Both values, $i and 0 2 , are unknown. If the product is con- 
sidered the more desirable the greater the value of 0, the problem of 
deciding between the two competing processes reduces to that of test- 
ing the hypothesis II that 6\ ^ 0 2 . Process 1 is chosen if H is rejected, 
and process 2 is chosen if II is accepted. 

The following procedure for testing the hypothesis H has been pro- 
posed by Girshick. We choose a particular value 0i° of 0i and a par- 
ticular value 0 2 ° of 0 2 where 0i° < 02°. Let H 0 denote the hypothesis 
that the joint distribution of x\ and x 2 is given by fix i, 9i°)f(x 2 , 0 2 °), 
and let Hi be the alternative hypothesis that the joint distribution of 
X\ and x 2 is given by fix i, 0 2 °)f(x 2) 0i°). We then set up the sequen- 
tial probability ratio test for testing the simple hypothesis H 0 against 
the simple alternative H\. The hypothesis H is accepted or rejected 
accordingly as the sequential probability ratio test leads to the accept- 
ance or rejection of H 0 . Thus, to carry out the test procedure, two 
constants A and B are chosen and the ratio 

C4 221 — = -ft* 11 ’ d 2°)f( X 2U 01°) • • • fix Im, e 2 °)f(x 2 m, 01°) 

P 0m /(* 11, 01°)/(*21, 02°) • * • 0l°)/(*2m, 0 2 °) 

is computed at each stage of the experiment. Here denotes the 
ath observation on Xi (i = 1, 2). It is assumed that the observations 
are taken in pairs, where each pair consists of an observation on X\ 
and an observation on x 2 . Experimentation is continued as long as 
the ratio pim/pom lies between B and A. The hypothesis H is accepted 
if Pim/Pom ^ B, and the hypothesis H is rejected if p\ m /pom ^ A . 

It has been shown by Girshick that in many important cases the 
above test procedure will have the following property: There exists a 
function v = v(0i, 0 2 ) such that v may be regarded as a reasonable 
measure of the difference between 0i and 0 2 , and the probability of 
accepting H depends only on the value of v. The function v satisfies, 
furthermore, the conditions: (1) v(6\ 9 d 2 ) = 0 when 6\ = 0 2 ; (2) v(0i, d 2 ) 
< 0 when 0 2 > 0i; (3) v(fli f 0 2 ) = —v(6 2} d\). 

If a function v with the above properties exists, the choice of the 
four quantities 0i°, 0 2 °, A , and B may be made on the basis of the fol- 
lowing considerations: Let 5 be a positive value such that the accept- 
ance of H is regarded as an error of practical importance whenever 
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v ^ 8, the rejection of H is regarded as an error of practical importance 
whenever v ^ — 8 ; for values v between — 8 and 8 we do not care par- 
ticularly which decision is made. Thus, we shall want a test procedure 
for which the probability of rejecting H will not exceed a preassigned 
value a whenever v ^ —8 and the probability of accepting H will not 
exceed a preassigned value whenever v ^ 8. The test procedure will 
have the desired properties if the quantities 0i°, 0 2 °, A, and B are 
chosen so that v(0i° f 0 2 °) = — 8 and the sequential probability ratio 
test for testing H 0 against Hi has the strength ( a , /3 ). For all prac- 
tical purposes we may let A = (1 — 0)/a and B = 0/(1 — a). 

As an illustration, we shall consider the following example. Suppose 
that one of two production processes is to be chosen. Suppose, further, 
that the quality characteristic under consideration is normally distrib- 
uted with known mean and unknown standard deviation a\ when proc- 
ess 1 is used, and that the distribution is normal with the same mean 
but unknown standard deviation a 2 when process 2 is used. The proc- 
ess that leads to a smaller standard deviation is preferred. Thus, the 
manufacturer is interested in testing the hypothesis H that ^ <r 2 . 
There is no loss of generality in assuming that the known means are 
equal to 0. Let H 0 be the hypothesis that <j x = <ri° and <r 2 = a 2 ° , and 
Hi the hypothesis that <j 1 — a 2 ° and <j 2 = ai° (ai° < a 2 °). Then the 
probability ratio for testing H 0 against II i is given by 

tA OQ v Vim GfaP)* “ 2Wjj) [ 2 (a;i « 2 “ X2 « 2)1 

(4:23) = e 

Vo m 

where Xi a denotes the ath observation from the population correspond- 
ing to process i. 

As Girshick has shown, the probability that the sequential prob- 
ability ratio test of H 0 against Hi will terminate with the acceptance 
of Hq depends only on the value of 


(4:24) 


v((Ti, ar 2 ) 



This quantity may be regarded as a reasonable measure of the devi- 
ation of (7 1 from g 2 . Suppose we want a test procedure satisfying the 
following conditions: The probability of rejecting H should not exceed 


a whenever - 
2 




— 8, and the probability of accepting H 


should not exceed whenever j- ( — ^ ~2 ) ~ Then we choose 

2 \< 7 2 < 7 1 / 
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<7i° and (72° so that 

(4:25) 2 ((^"(^y ) = " 5 

The probability ratio given in (4 :23) becomes then equal to 


(4:26) 


P\ m 8 2 — X2 a*) 

= 

POm 


When - log = 2(a; la 2 — x 2a 2 ) is used instead of , the test 

5 POm POm 

procedure can be carried out as follows. We continue taking pairs of 
observations as long as 


log B 


(4:27) < £ (x la 2 - x 2a 2 ) < 

We accept H if 


log A 


*=i 


(4:28) 

and reject H if 
(4:29) 


2 2 

/ .(^la X 2 a ) = 

hi « 

V'/ 2 2\ ^ h % A 

2_jx\* - X2<x ) ^ — — - 


a— 1 



PART II. APPLICATION OF THE GENERAL THEORY TO 
SPECIAL CASES 1 


Chapters. TESTING THE MEAN OF A BINOMIAL DISTRI- 
BUTION (ACCEPTANCE INSPECTION OF A LOT WHERE 
EACH UNIT IS CLASSIFIED INTO ONE OF TWO CATEGORIES) 

5.1 Formulation of the Problem 

Let x be a random variable which can take only the values 0 and 1. 
Denote by p the (unknown) probability that x takes the value 1. We 
shall deal here with the problem of testing the hypothesis that p does 
not exceed some specified value p f . 

This problem arises, for example, in acceptance inspection of a lot 
consisting of a large number of units of a manufactured product. Sup- 
pose that each unit is classified in one of the two categories: defective 
and non-defective. We shall assign the value 0 to any non-defective 
unit and the value 1 to any defective unit. Let p denote the unknown 
proportion of defectives in the lot. Then the result x of the inspection 
of a unit drawn at random from the lot can take only the values 1 
and 0 with probabilities p and 1 — p, respectively. Usually it will be 
possible to specify some value p' such that we would like to accept the 
lot whenever p ^ p' and we would like to reject the lot whenever 
p > p'. Thus, the problem of deciding whether the lot is to be ac- 
cepted or rejected on the basis of a random sample may be formulated 
as the problem of testing the hypothesis p S p' against the alternative 
hypothesis that p > p'. \ 

Since acceptance inspection of manufactured^ products is perhaps 
one of the most important applications of testing the mean of a bi- 
nomial distribution, in what follows we shall use the terminology cus- 

1 The special cases treated here are discussed mainly to illustrate the general 
theory and to bring out points of theoretical interest specific to these applications. 
Accordingly, computational procedures and simplifications are not stressed much 
and hardly any tables are given. A more detailed and non-mathematical discussion 
of these applications, together with a number of tables, charts, and computational 
simplifications, is contained in “Sequential Analysis of Statistical Data: Applica- 
tions,” a report prepared by the Statistical Research Group of Columbia University 
and published by Columbia University Press, Sept., 1945. This report will be 
referred to hereafter simply as SRG 255. 
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tomary in acceptance inspection. This, of course, does not mean that 
the test procedure is not applicable to other cases as well. In the 
terminology of acceptance inspection, our problem may be stated as 
follows: A proper sampling plan (test procedure) is to be devised for 
deciding whether the lot submitted for inspection should be accepted 
or rejected. 

/5.2 Tolerated Risks of Making Wrong Decisions 

Any sampling plan which does not provide for complete inspection 
of the lot may lead to a wrong decision. That is, we may accept the 
lot when p > p', or we may reject the lot when p ^ p'. Since com- 
plete inspection is frequently not feasible, or too costly, we are willing 
to tolerate some risks of making wrong decisions. In order to devise 
a proper sampling plan, it is necessary to state the maximum risks of 
wrong decisions that we are willing to tolerate. 

If p — p', the quality of the lot is just on the margin and we are 
indifferent which decision is made. For p > p', we prefer to reject the 
lot and this preference increases with increasing value of p. For p < p ', 
we prefer to accept the lot and this preference increases with decreas- 
ing value of p. If p is only slightly above p', the preference for rejec- 
tion is only slight and acceptance of the lot will not be regarded as an 
error of practical consequence. Similarly, if p is only slightly below 
p', rejection of the lot is not a serious error. Thus, it will be possible 
to specify two values p 0 and pi, p 0 below p' and pi above p', such that 
acceptance of the lot is" regarded as an error of practical consequence 
if (and only if) p ^ pi, and rejection of the lot is regarded as an error 
of practical importance if (and only if) p ^ po- If p lies between po 
and pi we do not care particularly which decision is made. 

After the two values p 0 and pi have been chosen, the risks of mak- 
ing wrong decisions which we are willing to tolerate may reasonably 
be formulated as follows: The probability of rejecting the lot should 
not exceed some small preassigned value a whenever p ^ po, and the 
probability of accepting the lot should not exceed some small pre- 
assigned value P whenever p ^ pi. 

Thus, the tolerated risks are characterized by four numbers, p 0 , Pi, 
a, and p. The choice of these four quantities is not a statistical prob- 
lem. They will be selected on the basis of practical considerations in 
each particular case. A proper sampling plan can be determined, as 
will be shown in the next section, after these four quantities have been 
chosen. 
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5.3 The Sequential Probability Ratio Test Corresponding to the 
Quantities po, pi, a, and fl 

6.3.1 Derivation of Algebraic Formulas for the Test Criterion 

A sampling plan satisfying the conditions that the probability of 
rejecting the lot does not exceed a whenever p po, and the prob- 
ability of accepting the lot does not exceed 0 whenever p pi/is given 
by the sequential probability ratio test of strength (a, /?) for testing 
the hypothesis p = Po against the hypothesis p = pi- This test is 
defined as follows (see Section 3.1): Let x, denote the result of the 
inspection of the ith unit; that is, x, = 1 if the ith unit inspected is 
found defective, and X{ = 0 otherwise. If p denotes the proportion 
of defectives in the lot, the probability of obtaining a sample equal 
to the observed (aq, • • • , x m ) is given by 

(5:1) p dm ( 1 - p) m ~ dm 

where d m denotes the number of defectives in the first m units in- 
spected. 2 Under the hypothesis that p = Pi the probability (5:1) be- 
comes equal to 

(5:2) pim = Pi dm (l - Pi) m ~ dn 

and under the hypothesis that p = p 0 the probability (5:1) becomes 
equal to 

(5:3) p 0m = Po dm tt - PoT~ dm 

The sequential probability ratio test is carried out as follows. At each 
stage of the inspection, at the inspection of the mth unit for each posi- 
tive integral value m, we compute 

(5:4) log — = d m log — + (m - d m ) log — 

Pom Po 1 - Po 

Inspection is continued as long as 

(5 :5) log — — < log — < log - 

1 — a Pom « 

1 The lot is assumed to be sufficiently large so that the successive observations 
x\ } X2, • • etc., may be regarded as independent. 
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Inspection is terminated the first time that (5:5) does not hold. If 
at this final stage we have 

(5:6) log— ^ log 

POm a 


the lot is rejected, and if 


(5:7) 


■ Vim . 
log — ^ log 

POm 




1 — a 


the lot is accepted. 3 

Inequalities (5:5), (5:6), and (5:7) can easily be seen to be equiva- 
lent to the following inequalities: 


log 


(5:8) 


1 — a 


log 


. Pi . 1 - Pi 

log log 

Po 1 - Po 


+ m 


1 - Po 
1 - Pi 


, Pi . 1 - Pi 

log log 

Po 1 - Po 


<d m < 


log 
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log 


. Pi . 1 - Pi 

log log 

Po 1 - Po 


+ m 


1 - Po 

1 - Pi 


. Pi . 1 - Pi 

log log 

Po 1 - Po 


log 


1 


log 


(5:9) d m Z 

and 

(5:10) d m Z 


. Pi , 1 - Pi 

log log 

Po 1 - Po 


+ m 


1 - Po 
1 - Pi 


Pi 1 

log log — 

Po 1 


Pi 


Po 


log 


1 - 


log 


. Pi , 1 - Pi 

log log 

Po 1 - Po 


+ m 


1 - Po 

1 - Pi 


, Pi . 1 - Pi 

log log 

Po 1 - Po 


For each value of m we shall denote the right-hand member of (5:10) 
by a m and call it acceptance number. Similarly, we shall denote the 


8 There is a slight approximation involved in the use of the constants log [/3/ (1 — «)] 
and log [(1 — 0)/a]. For further details see Section 3.3. 
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right-hand member of (5:9) by r m and call it rejection number. For 
purposes of practical computations, the use of the inequalities (5:8), 
(5:9), and (5:10) seems to be much more convenient than the use of 
the original inequalities (5:5), (5:6), and (5:7). 4 On the basis of in- 
equalities (5:8), (5:9), and (5:10), the sequential probability ratio test 
is carried out as follows. At each stage of the inspection we compute 
the acceptance number a m and the rejection number r m . Inspection 
is continued as long as a m < d m < r m . The first time that d m does not 
lie between the acceptance and rejection numbers, inspection is termi- 
nated. If d m ^ r m the lot is rejected, and if d m S a m the lot is ac- 
cepted. 


5.3.2 Tabular Procedure for Carrying Out the Test 

The acceptance number 


log 


(5:11) 


1 — a 


log 


Clm — 


.Pi i - Pi 

log log 

Po 1 - Po 


+ m 


1 - Po 
1 - Pi 


. Pi . 1 - Pi 

log log 

Po 1 - Po 


and the rejection number 


log 


1 -fi 


log 


(5:12) 


I'm - 


. Pi ,1 ~ Pi 

log log 

Po 1 - Po 


+ m ■ 


1 - Po 
1 - Pi 


log 


Pi 

Po 


log 


1 - Pi 
1 - Po 


depend only on the quantities p 0 , Pi, a, and 0. Thus, they can be 
computed and tabulated before inspection starts. If a m is not an 
integer, we may replace it by the largest integer < a m . Similarly, if 
r m is not an integer, we may replace it by the smallest integer 

> T m . 

As an illustration, consider the following example. Let p 0 = .1, 
Pi = .3, a = .02, and /3 = .03. The acceptance and rejection num- 
bers, as well as the results of the observations, in an experiment are 

4 The use of the inequalities (5:8), (5:9), and (5:10) instead of (5:5), (5:6), and 
(5:7) was first suggested by J. H. Curtiss. In SRG 255 similar transformations 
of the inequalities defining the test procedure have been used in other problems 
as well. 
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given in Table 5. In this example, inspection is terminated at m = 
22 with the rejection of the lot. 

TABLE 5 


m 

Number 
of Units 
Inspected 

Acceptance 

Number 

dm 

Number 
of Defects 
Observed 

r m 

Rejection 

Number 

1 


0 


2 


0 


3 


1 


4 


1 

4 

5 


1 

4 

6 


1 

4 

7 


1 

5 

8 


1 

5 

9 


2 

5 

10 


2 

5 

11 


3 

5 

12 


4 

6 

13 


4 

6 

14 

0 

5 

6 

15 

0 

5 

6 

16 

0 

5 

6 

17 

0 

5 

7 

18 

0 

6 

7 

19 

0 

6 

7 

20 

1 

6 

7 

21 

1 

6 

7 

22 

1 

7 

7 

23 

1 


8 

24 

1 


8 

25 

2 


8 

26 

2 


8 

27 

2 


8 

28 

2 


9 

29 

2 


9 

30 

3 


9 


6.3.3 Graphical Procedure for Carrying Out the Test 

The test procedure can also be carried out graphically. The num- 
ber m of observations is measured along the horizontal axis and the 
number d m of defects along the vertical axis. The points (m, a m ) lie 
on a straight line L 0 , since a m is a linear function of m. Similarly the 
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points ( m , r m ) lie on a straight line L\ . The intercept of Lq is given by 


log 


(5:13) 


h 0 = 


1 - 


. Pi . 1 “ Pi 

log — — log - 


Po 1 - Po 

and the intercept of L x is given by 

1 - 0 


log- 


(5:14) 


hi = 


. Pi . 1 - Pi 

log — — log - 


Po 1 - Po 

The lines Lq and L\ are parallel and the common slope is equal to 


(5:15) 


log 


1 - Po 

1 - pi 


s = 


log — — log 
Po 


1 - Pi 
1 - Po 


The two straight lines L 0 and L\ are drawn before inspection starts. 
The points ( m , d m ) are plotted as inspection goes on. We continue 
inspecting additional units as long as the point ( m , d m ) lies between 
the lines Lq and L x . Inspection is terminated the first time that the 
point (m, d m ) does not lie between the lines Lq and L\. If (m, d m ) lies 
on Lq or below, the lot is accepted. If (m, d,„) lies on L\ or above, the 
lot is rejected. 

Figure 11 shows the graphical procedure for the example given in 
Section 5.3.2. 
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6.4 The Operating Characteristic (OC) Function Up) of the Test 5 

6.4.1 Determination of Up) for Some Special Values of p 

As defined in Section 2.2.1, the value of the OC function L(p) for 
each p is equal to the probability that the lot will be accepted when 
p is the true proportion of defectives in the lot. One can easily verify 
that 


(5:16) 


L{ 0) = 1 and L(l) = 0 


Since the test procedure is so set up that the probability is 1 — a 
that the lot will be accepted when p = po, and the probability is ji 
that the lot will be accepted when p = pi, we have 


(5:17) 


When 


L(p 0 ) = 1 — « and L(pi) = 
1 - Po 


log 


p = s = 


1 - pi 


. Pi 1 - Pi 

log log 

Po 1 - Po 


we obtain from equation (3 :43) 


log 


1 -13 


(5:18) 


L(s) = 


hi 


1 - /3 

log + 


log 


1 — a 


hi + I ho | 


where h 0 and hi are the intercepts of the lines L 0 and Iq. 8 

Thus, five points on the OC curve corresponding to p = 0, 1, p 0 , pi, 
and s can immediately be determined. Since L(p) is monotonically 
decreasing with increasing p, the five points will determine fairly 
closely the shape of the whole OC curve. This will frequently be suf- 
ficient for practical purposes and there will be no need to compute L(p) 
for additional values of p. 

5 The formulas given in this section involve an approximation caused by neglecting 
the excess of d m over the boundaries a m and r m at the termination of the test proce- 
dure. For details see Sections 3.4 and A.2.3. An exact formula for L(p) is given 
in Section 5.4.3 for the special case in which the slope s of the decision lines is equal 
to the reciprocal of an integer. 

8 When p = s, the value of h in formula (3:43) is equal to 0. The limiting value of 

log A . 

the right-hand member of (3:43), when h — * 0, is equal to — —y. rr which is 

log A -f | log B | 

equal to the right-hand member of (5:18), since A ~ (1 — 0)/aand B — 0/(1 — a). 
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6.4.2 Determination of L(p) over the Whole Range of p 

It has been shown in Chapter 3, equations (3:45) and (3:46), that 7 


(5:19) 


Lip) = 



a ) \1 — a) 


where A is determined by the equation 


(5:20) 



To compute the OC curve, it is not necessary to solve equation 
(5:20) in A. For any arbitrarily chosen value A, the values of p and 
L(p) may be computed from (5:19) and (5:20). The point [p, L(p)] 
computed in this way will be a point on the OC curve. The OC curve 
can be drawn by plotting a sufficiently large number of points [p, L(p)] 
corresponding to various values of A. Figure 12 shows a typical OC 
curve. 



The range of A in (5:19) and (5:20) is from — °o to + <*>. It can be 
verified that the right-hand member of (5:19) is increasing with in- 
creasing A, and the right-hand member of {5 :20) is decreasing with in- 
creasing A. The five values of p considered in Section 5.4.1, that is, 
p = 0, po, s, Pi f 1, correspond to the values of A = + oo , 1, 0, —1, — «> , 
respectively, as can be seen from (5:20). Letting A = + 00 , 1, 0, — 1, 

7 In the formulas given in SRG 255, p. 2.50, the quantities p and L(p) are ex- 
pressed in terms of another parameter x which is functionally related to h . 
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— oo in (5:19), we obtain the corresponding five values of L(p) which 
coincide with those given in Section 5.4.1. 

If the part of the OC curve corresponding to positive values of h 
has been determined, the computation of the part of the OC curve 
corresponding to negative values of h can be simplified. 8 To show this, 
let A be a given positive value and let [p, L(p)] be the corresponding 
point on the OC curve. Let [p', L(p')] denote the point on the OC 
curve corresponding to — h . Then we have 


(5:21) L(p') = 


(y- 

py-ty* 

cyfryicy-fryi 

(r^)‘~ (~)‘ 


-fry 


<y 


-ty 


(5:22) p' = 


\i - J ^ p y (Lzfj “ a ' 

Similarly, 

1 _ ( l - /Pi\V l - Pi \ h SpA h 

- vJ Vpo/ Vi - pj \p 0 / 

(5:22) p' ; r— 

(P}) _ / I - PA /Pi\* 

Vpo/ VI — pj \1 — Pq) \p 0 / 

(l zl±) h _ ! 

= (n\ k — ~ Zzl = (P±) h 

\pj n - pi \* /pj\ A \p 0 / P 

\i — p 0 / Vp 0 / 

8 A similar simplification is given in SRG 255, p. 2.50, with reference to the 
parameter x used there. 
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Thus, the point [p', L(p')] corresponding to — h can be computed from 
the point [p, L(p)] corresponding to h by using the simple relations 

p' = (-)*P and L(p') = (^-Jup). 


5.4.3 Exact Formula for L(p) When the Reciprocal of the Slope of 
the Decision Lines Is an Integer 

The quantity z , i.e., the logarithm of the probability ratio for 
a single observation, can take only the values log (pi/po) and 
log [(1 — Pi)/(1 — Po)]- It follows from (5:15) that 



where s is the slope of the decision lines. Assume that 1/s is an in- 
teger. Then the two values of z are integral multiples of d = 
log [(1 — po)/(l “ Pi)L namely, —d and [(1/s) — 1 ]d, and the results 
in the last part of Section A.4 can be used to determine the exact OC 
curve. 9 On the basis of these results one can show that 


- - 2 + 


U i 

hf (ik - 


m 


(m* - 


Up) = -T- 

8 

E 

i=> 1 


J 


i - 2 + m + [M^] 

H 

(m» — - u 3 ) 




where A and B are the constants used in the sequential test, 10 the 
symbol [k] denotes the smallest integer ^ k , and u\> u 2 , • • •, U\ are the 
roots of the equation 5 

(1 - p)u + p = 1 
u s 


A different method for deriving an exact formula for L(p) was given 
by M. A. Girshick in The Annals of Mathematical Statistics , Vol. 17 
(1946). His method does not require the computation of the roots 

u u •••, Ui. 

8 

9 To reduce this case to the case discussed in the last part of Section A.4, one 
merely has to consider the test corresponding to z*, A*, and B* where z* — —z, 
log A* = — log B and log B* = — log A. 

10 To obtain a test of strength (a, /3), we used the approximate values A = 
(1 — /3)/a and B ** 0/(1 — a). 
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5.5 The Average Sample Number (ASN) Function of the Test 

Let n denote the number of observations required by the test pro- 
cedure. Then n is a random variable, since it depends on the outcome 
of the observations. The expected value of n depends on the propor- 
tion of defectives in the lot and is denoted by E p (ri). This can be 
plotted as a curve, p being measured along the horizontal axis and 
E p (n ) along the vertical axis. A typical ASN curve is shown in Fig. 
13. This curve is called the ASN curve of the test (see Section 2.2.2 
for a general definition of the ASN curve). 



The general formula for the ASN function of a sequential probability 
ratio test is derived in Section 3.5. The approximation formula (3:57) 
applied to the binomial case gives 11 


(5:23) 


L(p) log B + (1 - L{p)) log A 

E P in) 

Vi 1 — Pi 

P log f- (1 — p) log 

Po 1 - Po 


where A = (1 — ft) /a, B = ft/ (1 — a), and L(p) denotes the prob- 
ability that inspection terminates with the acceptance of the lot. 
Using this formula, we shall compute E p (n ) for p = 0, Po, Pi, and 1. 
Since L{ 0) = 1, the value of E p (n) is given by 


(5:24) 


Ep(n) = 


log 


ft 


1 — a 


log 


1 - Pi 
1 - Po 


11 The right-hand member of (5:23) can be expressed as a function of L(p), the 
intercepts, and the slope of the decision lines. See SRG 255, p. 2.63. 



100 TESTING THE MEAN OF A BINOMIAL DISTRIBUTION 


when p = 0. For p = po, we have L(p) = 1 — a and we obtain fron 
(5:23) 

0 1-/3 

(1 — a) log h a log 

<5^6) v » — r=7T 

Po log f- (1 — Po) log 

Po 1 - Po 

For p — pi , we have L(p) = 0 and we obtain from (5:23) 

1 -0 


(5:26) 




0 log b (1 - 0) log • 

1 — a. a 

Pi 1 — Pi 

pi log b (1 — Pi) log 

Po 1 “ 


Po 


Since L(l) = 0, we obtain from (5:23) 


log 


1 - 0 


(5:27) 


E p (n) = 


log 


Pi 

Po 


when p = 1. 

Using formula (A:99) in the Appendix, we can compute the value 
of E p (n) when p is equal to the common slope s of the acceptance and 
rejection lines, i.e., when 12 

* , 1 - Po 

log- 


p = 


From (A:99) we obtain 


1 - Pi 


, Pi , 1 - Pi 

log log 

Po 1 - Po 


= s 


(5:28) 


E,{n) = 


-( ,og r^)( logL r) 


E a (z 2 ) 


where E a (z 2 ) is the expected value of z 2 and z is a random variable 
which can take only the values log (pi/po) and log [(1 — Pi)/(1 — Po)] 

12 The value s of p corresponds to the value 6' in formula (A:99). It can be shown 
that s lies between po and pi. Formula (A:99), and therefore also (5:28), involves 
an approximation caused by neglecting the excess of the cumulative sum over the 
boundaries. 
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with probabilities s and 1 — s, respectively. Thus 

(5:29) E,(z 2 ) = s(log— + (1 - s) (log- —) 

\ Vo' VI- Po/ 


- s [( log ^) - ( log r^) ] + ( log r^) 

- ( ,0g r^;)( 108 ^ + l08 rr£) + ( ,og r3 


Pi' 

, Pi , 1 - Po 

= log — log 

Po 1 - Pi 


From (5:28) and (5:29) we obtain 

~( log r^)( loei ir) 


(5:30) 


E s (n) 


, Pi . 1 - Po 

log — log 

Po 1 - Pi 


The determination of the five points of the OC curve, as given in 
(5:24), (5:25), (5:26), (5:27), and (5:30), may frequently suffice in 
practice, since these five points already give a fairly good idea of the 
shape of the whole curve. The ASN curve generally increases as p 
increases from 0 to p 0 , and decreases as p increases from p x to 1. In 
the interval (p 0 , Pi) the ASN curve generally increases as p increases 
from po to some value p' , and decreases as p increases from p' to Pi- 
The value p' is generally equal to s or is very near s. 

If it is desired to plot the ASN curve over the whole range of p, it 
is necessary first to compute the OC function L(p). The value of 
E p (n ) can then easily be determined from (5:23) for any value p. 


5.6 Observations Taken in Groups 

5.6.1 General Discussion 

For practical reasons it may sometimes be preferable to take the 
observations in groups, rather than singly. That is, the test procedure 
is carried out as follows. A group gi consisting of v units is drawn 
from the lot. If the number of defectives d v in this group gi is less than 
or equal to the acceptance number a v , inspection terminates with the 
acceptance of the lot. If d v is greater than or equal to the rejection 
number r v , inspection terminates with the rejection of the lot. If 
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a v < d v < r v , a second group g 2 of v units is drawn. Again, the lot is 
accepted if the total number of defectives d 2v in the two groups is less 
than or equal to a 2v , the lot is rejected ii d 2v ^ r 2v , and a third group 
g 3 of v units is drawn if a 2v < d 2v < r 2v . This process is continued 
until either rejection or acceptance of the lot is decided. Thus, when 
the observations are taken in groups of v units, the number d m of 
defectives found is compared with the corresponding acceptance num- 
ber a m and rejection number r m only for m — v, 2v, Sv , • • *, etc. 

The purpose of this section is to make some comments on the effect 
of grouping on the OC and ASN curves of the sampling plan. Clearly, 
grouping can only increase the number of observations required by the 
test. For, suppose that inspection terminates at the nth unit when 
observations are taken singly. If n is equal to an integral multiple of 
v, i.e., n = kv , then the number of groups inspected, when observations 
are taken in groups, will be precisely equal to k y and the total number 
of units inspected will be the same as when observations are taken 
singly. However, if kv < n < (k l)v, grouping will cause an in- 
crease in the amount of inspection, since we shall have to inspect at 
least (k + 1) groups, that is, at least (k + 1)^ units. It may even 
happen that we shall have to inspect more than (k + 1) groups. This 
will be the case when d n lies outside the interval ( a n> ^n) y but a(fc 4 -i) v 
< d(k+i)v < r(fc+i) V . Thus, the increase in the expected number of 
units inspected caused by grouping may even exceed v in some cases. 

Regarding the effect of grouping on the OC curve, the following 
remarks may be made. Putting A — (1 — ft) /a and B = 0/(1 — «), 
the probability a of rejecting the lot when p = p 0 and the probability 
/S' of accepting the lot when p = pi will be only approximately equal 
to a and 0, respectively, even if the observations are taken singly. 
This was pointed out in Section 3.3, where the following inequalities 
were derived: 

a /ft 

(5:31) «' ^ and /ft' ^ 

1 — ft 1 — a 

It can easily be verified that these inequalities also remain valid when 
the observations are taken in groups. The quantities a and /ft are 
usually very small and a/(l — /ft) and /ft/ (1 — «) are very nearly equal 
to a and /ft, respectively. Thus, also in ca$e of grouping, the realized 
values cl and /ft' cannot exceed the intended values a and /ft, respec- 
tively, except by an exceedingly small quantity which can be neglected 
for all practical purposes. This means that, for all practical purposes, 
grouping will not decrease the protection against wrong decisions pro- 
vided by the test. The only possible effect of practical significance that 
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may be caused by grouping is that it may make a' or 0' substantially 
smaller than the intended values a and 0. This feature of grouping 
compensates, to some extent, for the increase in the number of ob- 
servations. 

It may be of interest to remark that, if the number v of units in a 
group is equal to the reciprocal of the common slope s of the accept- 
ance and rejection lines and if the intercepts of these lines are integers, 
the OC curve is not affected at all by grouping. 13 This can be seen as 
follows: Because s = 1/d, we have a m +d = a m + 1 and r m +d = 
r m + 1. Furthermore, since the intercepts of the acceptance and re- 
jection lines are assumed to be integers, a m and r m have integral values 
for any m which is an integral multiple of v. If item-by-item inspec- 
tion leads to acceptance of the lot at the nth item, then n must be an 
integral multiple of v, and therefore inspection in groups of v will also 
lead to acceptance. If item-by-item inspection leads to rejection of 
the lot at the nth item, then we have d n ^ r n . Let n' be the smallest 
integral multiple of v greater than or equal to n. Then d n = r n >, since 
d n is an integer, d n — r n S 1, and r n > — r n g 1. Hence d n r ^ r n > and 
inspection in groups will also terminate with rejection of the lot. Thus, 
inspection in groups leads to exactly the same decision as item-by- 
item inspection and consequently grouping does not affect the OC curve. 

5.6.2 Upper and Lower Limits for the Effect of Grouping on the 
OC and ASN Curves 

Upper and lower limits for the effect of grouping on the OC and ASN 
curves can be obtained by considering the following three auxiliary 
sequential sampling plans. Let h 0 be the intercept of the acceptance 
line, hi the intercept of the rejection line, and s the common slope in 
the given sampling plan. The first auxiliary plan is obtained by 
changing h 0 to h 0 * = h 0 — vs and leaving hi and s unchanged. The 
second auxiliary plan is obtained by changing hi to hi* = hi + vs y 
leaving h 0 and 5 unchanged. Finally, the third auxiliary plan corre- 
sponds to the intercepts h 0 *, hi*, and slope s. Let Li(p) denote the 
OC function and E p i (n) the ASN function of the auxiliary plan i , when 
item-by-item inspection is used (i = 1, 2, 3). Furthermore, let L(p) 
denote the OC function and E p (n) the ASN function of the given plan 
when item-by-item inspection is used. When inspection is made in 
groups the OC and ASN functions are affected, 14 and we shall denote 
them by L(p) and E p (n) respectively. 

w See also SRG 255, p. 2.30. 

14 Except, in the case of the OC function, when the number of units in the group 
is the reciprocal of the slope, as stated in Section 5.6.1. 
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It can easily be seen that whenever the first auxiliary plan (using 
item-by-item inspection) leads to the acceptance of the lot, the orig- 
inal plan (taking observations in groups) also leads to acceptance. 
The converse is, however, not necessarily true. That is, it may happen 
that the auxiliary plan leads to rejection of the lot, whereas the original 
plan leads to acceptance. Thus, we have 

(5:32) L x (p) SUV) 

Similarly, one can verify that whenever the second auxiliary plan 
(using item-by-item inspection) leads to rejection of the lot, the orig- 
inal plan (using grouping) also leads to rejection. Hence 

(5:33) 1 -I*(p) ^ 1 - L(p) 

This can be written as 

(5:34) " L(p) S L 2 (p) 

From (5:32) and (5:34) we obtain 

(5:35) h(p) ^ L(p ) ^ L 2 (p) 

To derive an upper limit for E p (n) we shall make use of the third 
auxiliary plan. If this plan (using item-by-item inspection) terminates 
at the inspection of the nth unit, the original plan (using grouping) 
must terminate at the latest with the inspection of the group in which 
the nth item is included. 15 Hence, the number n' of units inspected 
when the original plan is used cannot exceed n + v. From this it 
follows that 

(5:36) E p (n ) ^ E p3 (n) + v 

Since E p (ri) ^ -^p(n), we obtain the limits 

(5:37) E p (n ) ^ E p (n) ^ E p3 (n) + v 

Limits for L(p) and E p {n) could also be derived by using the method 
described in Sections A.2.3 and A.3.1 of the Appendix. The limits 
given in (5:35) and (5:37) will be rather close when p\/po and 
(1 — Pi)/(1 — po) are near 1 and vs does not exceed 1. 

5.7 Truncation of the Test Procedure 

The sequential sampling plan does not provide any definite upper 
bound for the number n of units to be inspected. Any large value of 
n is possible, but the probability is small that n will exceed twice or 

15 It is possible, of course, that inspection terminates with an earlier group. 
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three times its expected value. It is sometimes desirable to set a 
definite upper bound n 0 for n, excluding even a small probability that 
n may exceed n 0 . This can be done by truncating the sequential proc- 
ess at n = ft 0 . That is to say, we terminate the process at n = n 0 
even if the regular sequential rule does not lead to a final decision for 
n ^ no* The following seems to be a reasonable rule for deciding 
acceptance or rejection of the lot at n = tiq if no decision is reached 
for n ^ % with the regular sequential procedure: If d nQ ^ 
(a no + 0/2 we reject the lot, and if d nQ < ( a no + O/ 2 we accept 
the lot. 

Truncation and its effect on the OC curve are discussed in Section 
3.8. If n 0 is put as high as three times the expected value of n, the 
effect of truncation on the OC curve is negligibly small, since the 
probability is nearly 1 that the regular sequential procedure will termi- 
nate for n < n 0 . 



Chapter 6. TESTING THE DIFFERENCE BETWEEN THE 
MEANS OF TWO BINOMIAL DISTRIBUTIONS (DOUBLE 

DICHOTOMIES) 

6.1 Formulation of the Problem 

Suppose that we want to compare the effectiveness of two produc- 
tion processes where the effectiveness of a production process is meas- 
ured in terms of the proportion of effective units in the sequence pro- 
duced. We shall say that a unit is effective if it has a certain desirable 
property, for example, if it withstands a certain strain. Let pi be the 
proportion of effectives if process 1 is used, and p 2 the proportion of 
effectives if process 2 is used. In other words, pi is the probability 
that a unit produced will be effective if process 1 is used, and p 2 is the 
probability that a unit produced will be effective if process 2 is used. 
Suppose that the manufacturer does not know the values of pi and 
p 2} and that process 1 is in operation. If p x ^ p 2} the manufacturer 
wants to retain process 1. However, if pi < p 2) especially if pi is 
substantially smaller than p 2j the manufacturer would like to replace 
process 1 by process 2. Thus, we are interested in testing the hypoth- 
esis that pi ^ p 2 against the alternative that p\ < p 2 . 

A more general formulation of the problem can be stated as follows. 
Consider two binomial distributions. Let p\ be the probability of a 
success in a single trial according to the first binomial distribution, 
and let p 2 be the probability of a success in a single trial according to 
the second binomial distribution. We shall use the symbol 1 for suc- 
cess and the symbol 0 for failure. Suppose that the probabilities pi 
and p 2 are unknown. We consider the problem of testing the hypoth- 
esis that pi ^ p 2 on the basis of a sample consisting of Ni observations 
from the first binomial distribution and N 2 observations from the 
second binomial distribution. Since in many experiments the case 
N i = N 2 is mainly of interest, and since this case (as we shall see 
later) makes an exact and simplified mathematical treatment of the 
problem possible, in what follows we shall assume that Ni — N 2 = N 
(say). Thus, on the basis of the outcome of the two series of N inde- 
pendent trials we have to decide whether the hypothesis pi ^ p 2 
should be accepted or rejected. 
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6.2 The Classical Method 

The classical solution of the problem for large N is given as follows. 
Let S\ be the number of successes in the first set of N trials (drawn 
from the first binomial distribution), and let S 2 be the number of suc- 
cesses in the second set of N trials (drawn from the second binomial 
population). Denote (Si + S 2 )/2N { by p and 1 — p by q. Then for 
large N the expression 

(6:1) i/wfq 

is normally distributed with zero mean and unit variance if pi = p 2 . 
Suppose that the level of significance we wish to choose is a. Let \ a 
be the value for which the probability that a normal variate with zero 
mean and unit variance will exceed A« is equal to a. (For example, if 
a = .05, X a = 1.64.) Thus, if pi = p 2) the probability that the ex- 
pression (6:1) will exceed \ a is equal to a. If pi > p 2) the probability 
that the expression (6:1) will exceed \ a is less than a. According to 
the classical method, the hypothesis that pi ^ p 2 is rejected if the 
observed value of (6:1) exceeds \ a . This method involves an approxi- 
mation, since the distribution of (6:1) is not exactly normal (for small 
N it is far from normal). For small N an exact method has been pro- 
posed by R. A. Fisher which, however, involves cumbersome calcula- 
tions. In Section 6.3 we shall suggest another (non-sequential) 
method which is exact and is fairly simple to apply as far as compu- 
tations are concerned. The latter method has the further advantage 
of being suitable for sequential analysis, to which existing methods 
are not readily adaptable. 

6.3 An Exact Non-Sequential Method 

Let ai, • • • , aN be the results in the first set of N trials, and fq, • • • , bv 
the results in the second set of N trials. These results are arranged in 
the order observed. Consider the sequence of N pairs: 

(6:2) (a u b x ), • • •, (a N , b N ) 

Let ti be the number of pairs (1, 0) and t 2 the number of pairs (0, 1) 
in this sequence. We consider only the pairs (0, 1) and (1,0) and base 
the test on them. 

Let a be the outcome of an observation from the first population, 
and b the outcome of an observation from the second population. 
The probability that (a, 6) = (1, 0) is equal to p x (l - p 2 ), and the 
probability that (a, b) = (0, 1) is equal to (1 — Pi)p 2 • Hence, know- 
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ing that (a, b ) is equal to one of the pairs (0, 1) and (1,0), the (condi- 
tional) probability that it is equal to (0, 1) is given by 

ON (1 - Pl)P2 

(0:3) v = — tt — t; r 

V i(l - P 2 ) + P 2 (l ~ Pi) 

and the (conditional) probability that it is equal to (1,0) is given by 


(6:4) 


Pi(l - P2) 

Pl(l ~ P2) + (1 “ Pl)P2 


Hence, when only the pairs (1,0) and (0, 1) are considered, the variate 
t 2 is distributed like the number of successes in a sequence of t = 
ti + t 2 independent trials, the probability of a success in a single trial 
being equal to p. One can easily verify that p = Y if Pi = P2, 
p < y 2 if pi > p 2t and p > Y if pi < p 2 . Thus, the hypothesis to 
be tested, i.e., the hypothesis that p\ ^ p 2 , is equivalent to the hypoth- 
esis that p ^ y 2 . Thus, we can test the hypothesis that p\ ^ p 2 by 
testing the hypothesis that p tk Y on the basis of the observed value 
of t 2 . Since the distribution of t 2 is the same as the distribution of the 
number of successes in t = t\ + t 2 independent trials (t is treated as a 
constant and the probability of a success in a single trial is equal to 
p) y the test procedure can be carried out in the usual manner. If we 
want a level of significance «, a critical value T is chosen so that for 
p = Yi the probability that t 2 ^ T is equal to a. The hypothesis that 
p k Yl is rejected if and only if the observed t 2 is greater than or equal 
to the critical value T. The value of T can be obtained from a table 
of the binomial distribution. If t is large, t 2 is nearly normally dis- 
tributed, and the critical value T can be obtained from a table of the 
normal distribution. 

This procedure thus provides a simple test of the hypothesis that 
Pi ^ P2* The question arises whether the efficiency of this method is 
as high as that of the classical method. It would seem that the method 
suggested here cannot be a most efficient procedure, since the values 
of t\ and t 2 depend on the order of the elements in the sequences 
’(ai, • • •, div) and (61, • • •, 6 jv), and there is no particular reason* to 
arrange them in the order observed. However, it has been shown 1 
that the loss in efficiency as compared with the classical method is 
negligible if the number N of trials is large. 2 


1 See the author’s report, Sequential Analysis of Statistical Data: Theory , sub- 
mitted to the Applied Mathematics Panel, National Defense Research Committee, 
Sept., 1943. 

2 The author believes that the loss in efficiency is slight even when N is small, 
although no exact investigation of this case has been made. 
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It should be pointed out that the procedure for testing the hypoth- 
esis that pi ^ P 2 can be used also for testing the hypothesis that 
Pi — P2 if the alternative hypotheses are restricted to p 2 > Pi- 

In addition to simplicity and exactness, the present method seems 
superior to the classical one in the following respect. Suppose that 
(contrary to the original assumption) the probability of a success varies 
from trial to trial. Let denote the probability of success in the 
ith trial of the first set, and let p 2 ^ denote the probability of success 
in the zth trial in the second set (i = 1, • • *, N). Assume that the 
probabilities pi (l) and p 2 (l) are entirely unknown and we wish to test 
the hypothesis that p x (1) — p 2 (1) = • • • = p x (N) — p 2 ^ N) = 0. In this 
case the classical method is not applicable, but the present method 
provides a correct procedure. Such a situation may arise, for instance, 
if we want to test the hypothesis that the probability of a success 
(hitting the target) is the same for two different guns. In the course 
of the experiments the probability of a hit may change because of ex- 
ternal conditions such as wind or disposition of the gunner. However, 
these external conditions are likely to affect both guns equally if the 
trials are made alternately (or approximately alternately), so that if 
the two guns are equally good we have p x (t) = p 2 (l) (i — 1, • • • A). 


6.4 Sequential Test of the Hypothesis That p { ^ p 2 

^6.4.1 Risks That We Are Willing to Tolerate of Making Wrong 
Decisions 

In order to devise a proper sequential test for testing the hypothesis 
that pi ^ p 2 , we have to state first what risks of making wrong deci- 
sions we are willing to tolerate. The efficiency of production process 1 
may be measured by the ratio of effectives to ineffectives produced, 
i.e., by Aq = Pi/(l — Pi). Production process 1 may be regarded the 
more efficient the larger the value of Aq. Similarly, the efficiency of 
production process 2 may be measured by k 2 = p 2 /(l — p 2 ). The 
relative superiority of production process 2 over process 1 can then 
reasonably be measured by the ratio of k 2 to ki, i.e., by 


(6:5) 


fa _ gg( 1 ~ Pi) 
ki Pi(l - P 2 ) 


[f u = 1, the two processes are equally good. If u > 1, process 2 is 
superior to process 1, and if u < 1, process 1 is superior to process 2. 
Thus, the manufacturer will, in general, be able to select two values 
;>f u, uq and ui 9 say (uq < iq), such that the rejection of process 1 in 
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favor of process 2 is considered an error of practical importance when- 
ever the true value of u ^ Uo, and the maintenance of process 1 is con- 
sidered an error of practical importance whenever u ^ u\. If u lies 
between Uo and u\, the manufacturer does not care particularly which 
decision is made. 

Clearly, we will always have u 0 < u\. If the transition from pro- 
duction process 1 to process 2 involves some cost or other inconven- 
iences, it seems reasonable to put u 0 = 1 (or u 0 may even be slightly 
greater than 1). This choice of u 0 really means that we consider the 
rejection of process 1 a serious error whenever this process is not infe- 
rior to process 2. On the other hand, if the transition from process 1 
to process 2 does not involve any inconveniences, the rejection of proc- 
ess 1 in favor of 2 cannot be a serious error when the two processes are 
equally efficient, i.e., when u = 1. Thus, in such a case it seems reason- 
able to choose u 0 somewhat below 1. 

After the quantities u 0 and u\ have been chosen the risks that we 
are willing to tolerate may reasonably be expressed in the following 
form: The probability of rejecting process 1 should not exceed a pre- 
assigned value a whenever u ^ uo, and the probability of maintaining 
process 1 should not exceed a preassigned value whenever u ^ u x . 
Thus, the risks that we are willing to tolerate are characterized by the 
four quantities Uq, u X) a, and ft. 

6.4.2 The Sequential Probability Ratio Test Corresponding to the 
Quantities u 0f u 1} a, and p 

After the four quantities w 0 , u x , a , and ft have been chosen, a proper 
sequential test can be carried out as follows. The (conditional) prob- 
ability that we obtain a pair (0, 1), as given in (6:3), can be expressed 
as a function of u. In fact 

(1 ~ Vl)P2 

- (1 ~ Pl)P2 = Pi (1 ~ Pg) = M 

V Pl(l - V2) + P2(l “ Pi) 1 | P2(l - Pi) l+U 

Pl(l - P2) 

Let H 0 denote the hypothesis that p = u 0 /(l + Uo), and H 1 the 
hypothesis that p = u\/( 1 + ^i). A proper sequential test satisfying 
our requirements concerning tolerated risks is the sequential prob- 
ability ratio test of H 0 against Hi. The acceptance and rejection num- 
bers for this sequential test can be obtained from (5:11) and (5:12) by 
substituting Wo/(l + ^o) for Po> ^i/(l + U\) for p\ } and t = t x + t 2 
for m. 
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Thus, for each value of t the acceptance number is given by 


log 


0 


(6:7) 


1 


log 


a t = 


+ t ■ 


l +m 

1 + Uq 


log U\ - log Uq log III - log Uq 

and the rejection number is given by 

1-/3 l+ui 

log log 

(6:8) r t = : * + t ; 


l+^o 


log Ui - log Uq log U X — log Uq 


These acceptance numbers a t and rejection numbers r t (t = 1, 2, • • •) 
are best tabulated before experimentation starts. The sequential test 
is then carried out as follows. The observations are taken in pairs 
where each pair consists of an observation from the first process and 
an observation from the second process. We continue taking pairs as 
long as a t < t 2 < r t . The first time that t 2 does not lie between the 
acceptance and rejection numbers, experimentation is terminated. 
Process 1 is maintained if at this final stage t 2 ^ a ty and process 1 is 
rejected in favor of 2 if t 2 ^ r t . 

As an illustration, the following example is given. Let Uq — 1.3, 
u\ = 3, a = .03, and = .10. The observed pairs (0, 1) and (1, 0) 
in an experiment, and the rejection and acceptance numbers, are given 
in Table 6. In this example, the sampling process is terminated at 
t = 18 with the retention of process 1. 

The test procedure can also be carried out graphically as shown in 
Fig. 14. The total number t of pairs (0, 1) and (1, 0) is measured along 



Fig. 14 
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the horizontal axis. The points (t, at) will lie on a straight line Lo, 
since a t is a linear function of t. The points (t, r t ) will lie on a parallel 
line Li. We draw the lines L 0 and L\ and plot the points ( t , t 2 ) as 


TABLE 6 


t 

Number 
of Pairs 
(0, 1), (1, 0) 
Observed 

Pairs 

(0, 1), (1, 0) 
Observed 

at 

Accept- 

ance 

Number 

k 

Number 
of Pairs 
(0, 1) 
Observed 

rt 

Rejection 

Number 

1 

(0, 1) 


1 


2 

(0, 1) 


2 


3 

(1,0) 


2 


4 

(1, 0) 


2 


5 

(1, 0) 

0 

2 


6 

(0, 1) 

1 

3 


7 

(1, 0) 

1 

3 


8 

(0, 1) 

2 

4 


9 

(0, 1) 

3 

5 


10 

(1,0) 

3 

5 


11 

(0, 1) 

4 

6 


12 

(0, 1) 

5 

7 


13 

(0, 1) 

5 

8 

13 

14 

(1, 0) 

6 

8 

14 

15 

(1, 0) 

7 

8 

14 

16 

(0, 1) 

7 

9 

15 

17 

(1, 0) 

8 

9 

16 

18 

(1,0) 

9 

9 

16 

19 


9 


17 

20 


10 


18 

21 


11 


18 

22 


11 


19 

23 


12 


20 

24 


13 


20 

25 


13 


21 

26 


14 


22 

27 


15 


22 

28 


15 


23 

29 


16 


24 


experimentation goes on. The first time that the point (t, t 2 ) is not 
within the lines Lo and Li experimentation is terminated. Process 1 
is maintained if at the final stage (t, t 2 ) lies on Lo or below, and proc- 
ess 1 is rejected if (t, t 2 ) lies on L\ or above. 
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The intercept of line L 0 is given by 


log Ml - log M 0 

and the intercept of L\ is given by 

. 1 -P 

/V* . l0g — 

(6:10) '#/ hj = ; ; 

log Ui — log Uq 

The common slope of the two lines is equal to 


l +Ui 

og rr 

1 T Wo 

(6:11) * = : T 2 - 

log Ui - log Uq 

6.4.3 The Operating Characteristic Curve of the Test 

For any value u of the ratio A; 2 /&i> we shall denote by L(u) the 
probability of maintaining process 1. Clearly, L(u) is a function of u. 
This function L(u) is called the operating characteristic function of the 
test. It can be obtained from equations (5:19) and (5:20) by substi- 
tuting Wo/(l + ^o) for po and U\/{ 1 + u{) for p\. These equations 


L{u) = 


1 +u /Ui(l + u 0 )\ h 
\Uq(1 + Ui)/ 

Equation (6:13) can be written as 


\1 + Ml/ 

/ Mi(l + «o) \* /_M 
\Mo(l + Mi)/ \1 4 


(6:14) 


i-(i±^y 

\1 + Mi/ 
/ Mi(l + Mq) \ A 
\Wq( 1 + Wi)/ 


8 In the formulas given in SRG 255, p. 3.38, the quantities u and L(u ) are ex- 
pressed in terms of a “dummy’ ’ variable x which is functionally related to h. 
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For any given value h we compute u and L{u) from the equations 
(6:12) and (6:14). The point [w, L(u)] obtained in this way will be a 
point of the OC curve. By calculating the points [ u , L(u )] for a suffix 
ciently large number of values of h, the OC curve can be drawn. 

We shall compute [u, L(u)] for h = — «>, —1, 0, 1, + <». Since 

- — < 1 and — — — — — > 1, we obtain from (6:12) and (6:14) 

1 + Ui Uo(l + Ui) 

(6:15) u = oo and L(u) = 0 when h = — oo 

(6:16) u — 0 and L(u) = 1 when h = +°o 


Furthermore we obtain 


(6:17) 

and 

u — U\ and 

L{u) = 0 

when h = — 1 

(6:18) 

u = uq and 

L(u) = 1 — a 

when h = +1 


For h = 0, the expressions u and L(u) have the form 0/0. The 
limiting values of u and L{u) when h — > 0 can be obtained by differen- 
tiating numerator and denominator at h = 0. Then we have 


(6:19) 


u = 


log 


1 + Ui 

l+^o 


log 


Ui(l + Uo) 
Uo(l + U\) 


and 


L(u) = 


log 


1 -P 

a 


log 


1 -0 
a 


- log 


0 

1 — a 


when h = 0. 

These five points on the OC curve already determine roughly the 
shape of the curve. It can be seen that u is a decreasing function of 
h and L(u ) is an increasing function of h. Hence L(u) is a decreasing 
function of u. As u varies from 0 to u 0 , L(u) decreases from 1 to 
1 — a. In the interval from Uq to U\ y L(u) decreases from 1 — a to 
0, and as u varies from u\ to + 00 , the OC function L(u ) decreases 
from /3 to 0. 


6.4.4 The Average Amount of Inspection Required by the Test 

For any value u of the ratio k 2 /ki, let E u (t) denote the expected 
value of the total number of pairs (0, 1) and (1,0) required by the 
test. The value of E u (t ) can be obtained from (5:23) by substituting 
E u (t) for E p (ri), L(u) for L(p), tio/(l + ^o) for p 0 , wi/(l + for p h 
and u/{ 1 + u) for p. Thus 4 

4 The right-hand member of (6:20) can be expressed as a function of L(u), the 
intercepts and the slope of the decision lines. See SRG 265, p. 3.41. 
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L(u) log 


P 


(6:20) E u {t) = 


1 - 


+ (1 - L(u)) log 


1 ~P 


Mi(l + Wo) , 1 . 1 +“0 

■ log - — ; 7 + - — ; log 


1 -(- W Wo(l “h Wj) 1 "l - W 1 "t" Mi 

To compute the expected value of the total number of pairs (in- 
cluding also the pairs (0, 0) and (1, 1)), we merely have to divide the 
right-hand side of equation (6:20) by pi(l — P 2 ) + £ 2(1 — Pi)- 
Since L(0) = 1 and L(<») = 0, we obtain from (6:20) 

, P 

log- 

1 — a 

E u (t ) = — — — when u = 0 


( 6 : 21 ) 


and 


( 6 : 22 ) 


log 


1 + Wo 
1 + Wi 

1 -p 


log 


E u (t) - 


log 


Wi(l + Wo) 


when w = °° 


Since L(wq) 


' w 0 (l + Wi) 

1 — a and L(ui) = p, it follows from (6:20) that 


(1 - a) log 


P 


(6:23) E ua (() = 


l — a 


+ a log 


1 -P 


U 0 


and 


l + ^o 

P log 


Ul(l + Uq) 

log — — : 7 + 


1 


Wo(l + u i ) 
p 


1 + Uq 


log 


(6:24) E Ul (t) = 


1 — a 


+ (1 - P) log - 


1 


l+^o 

l+^i 

P 


Ui , Uiil + U 0 ) 1 1+^0 

■ log — 7 + — 


1 + Ui ° U 0 (l + U\) 1 + Ui l'+ Ui 

In Section 5.5 we have computed the expected value of n when p 
is equal to the slope of the acceptance and rejection lines. This corre- 
sponds to the case when u/( 1 + u) — s, i.e., u = s/(l — s), where the 
slope s is given in (6:11). The value of E u {t) for u = s/(l — s) can 
be obtained from the right-hand member of (5:30), replacing pi by 
u x /( 1 + Ui) and p 0 by u 0 /( 1 + u 0 ). Thus 


E_*Jt) = 


-('° 8 rb)( l0|! V) 

U\{ 1 + Uq) l+^i 
log log 

Uq(1 + U\) 1 + Wo 


(6:25) 
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The determination of the five values of E u (t), as given in (6:21) 
through (6:25), may frequently suffice in practice, since these five 
points generally give a fairly good idea of the shape of the whole curve. 

6.4.6 Observations Taken in Groups 

In applications it may happen that, at each stage in the sequential 
process, instead of drawing a single observation we draw a group of 
v observations from each of the binomial distributions. Hence, instead 
of a single pair, we have two groups of v observations. The effect of 
grouping on the OC and ASN curves has been discussed in Section 5.6 
and the results obtained there can be applied to the case under con- 
sideration here. If the order of observations in each group of v is re- 
corded, we can establish the number of pairs (0, 1) and the number of 
pairs (1, 0) for each pair of groups of v observations. In such a case 
the test can be carried out as described in Section 6.4.2, since after 
each pair of groups of v observations we can compute t and t 2 . How- 
ever, if the order of observations in such groups is not recorded, the 
difficulty arises that we are not able to determine the values of t and 
t 2 needed for the test procedure. 

It has been shown 6 that in such a case we may replace t and t 2 by 
certain estimates of t and t 2 without affecting seriously the probability 
of making an incorrect decision. The estimates of h and t 2 (and thereby 
also an estimate of t = t x + t 2 ) are obtained as follows. Let v x be the 
number of successes in the group of v observations drawn from the first 
binomial distribution, and let v 2 be the number of successes in the 
group of v observations drawn from the second binomial distribution. 
Then for this pair of groups of v observations we estimate the number 
of pairs (1, 0) to be v x — (v x v 2 /v) and the number of pairs (0, 1) to be 
v 2 — (v x v 2 /v). Thus, an estimate of t x is obtained by summing v x 
— (v\v 2 /v) over all pairs of groups observed, and that of t 2 is obtained by 
summing v 2 — {v\v 2 /v) over all pairs of groups observed. 

For the effect of grouping on the OC and ASN curves, the results 
of Section 5.6 can be applied, since the test procedure discussed here 
reduces to that considered in Section 5.6 when p = u/(l + u), 
m = t x 1 2 — t } and d m = t 2 . 

6 See the author's report, Sequential Analysis of Statistical Data: Theory , sub- 
mitted to the Applied Mathematics Panel, National Defense Research Committee, 
Sept., 1943. 



Chapter 7. TESTING THAT THE MEAN OF A NORMAL DIS- 
TRIBUTION WITH KNOWN STANDARD DEVIATION FALLS 
SHORT OF A GIVEN VALUE 

7.1 Formulation of the Problem 

Let z be a random variable which is normally distributed with un- 
known mean 0 and known standard deviation a. In this section we 
shall deal with the problem of testing the hypothesis that 0 is less than 
or equal to some specified value O'. 

Such a problem arises frequently, for example, in quality control and 
acceptance inspection. Suppose that a lot consisting of a large number 
of units of a manufactured product is submitted for acceptance inspec- 
tion. The number of units in the lot is assumed to be sufficiently large 
so that the lot may be treated as containing infinitely many units. 
Suppose that the result of an observation is a measurement x of some 
quality characteristic of the unit, such as the weight, or hardness, or 
tensile strength. The value of x will, in general, vary from unit to 
unit. It is assumed that x is normally distributed with a known stand- 
ard deviation a but unknown mean 0. Suppose, furthermore, that the 
product is considered the more desirable the smaller the value of 0. 
Then it will, in general, be possible to designate a particular value 0' 
such that we prefer to accept the lot if 0 < 0' and we prefer to reject 
the lot if 0 > O'. Thus, in such a situation, we are interested in de- 
vising a sampling plan to test the hypothesis that 0 < O'. 

Since quality control and acceptance inspection is an important field 
of application for such test procedures, we shall continue the discus- 
sion using the terminology of acceptance inspection. This, of course, 
should not be interpreted as a restriction on the general validity and 
applicability of the test procedure. 

7.2 Tolerated Risks of Making Wrong Decision 

If 0 = O', we are indifferent whether the lot is accepted or rejected. 
The preference for acceptance increases with decreasing value of 6 in 
the domain 0 < O', and the preference for rejection increases with in- 
creasing value of 0 in the domain 0 > 0 ' . Thus, it will be possible, in 
general, to find two values Oq and 0i (Oq < 0' and 0i > 0') such that 
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rejection of the lot is considered an error of practical consequence if 
0 ^ 0 o , and acceptance of the lot is considered an error of practical 
consequence if 0 ^ 0i ; for values 0 between 0 O and 0i we do not care 
particularly which decision is taken. Using the terminology introduced 
in Section 2.3.1, we may say that the zone of preference for acceptance 
consists of all values 0 for which 0 ^ 0o, the zone of preference for re- 
jection is the set of all values 0 for which 0 ^ 0 1; and the zone of in- 
difference consists of all values 0 between 0 O and 0i. 

After the two values 0 O and 0i have been chosen the risks that we 
are willing to tolerate may reasonably be expressed as follows. 1 The 
probability of rejecting the lot should not exceed a small preassigned 
value cl whenever 0 ^ 0 O , and the probability of accepting the lot 
should not exceed a small preassigned value /3 whenever 0 ^ 0i. Thus, 
the risks that we are willing to tolerate are characterized by the four 
numbers 0 O , 0i, «, and /3. 


7.3 The Sequential Probability Ratio Test Corresponding to the 
Quantities 9 0 , ®i, a, and p 


The requirements regarding the tolerated risks are satisfied by the 
sequential probability ratio test of strength (a, /3) for testing the hy- 
pothesis that 0 = 0 O against the alternative that 0 = This sequen- 
tial test is given as follows. Let x iy x 2y • • etc., be the successive 
observations on x. The probability density of the sample (xi, • • *, x m ) 
is given by 


(7:1) 


m 

1 ”272 2 (*«-* o) 2 

VOm = a==1 


(2tt) 2 a m 


if 0 = 0o, and by 
(7:2) 


~ 2^2 2 <*«" 


e i) 2 


Vim = 


(&r) 


2^ 


if 0 = 0i. The probability ratio Vim/pom is computed at each stage 
of the inspection. Additional observations are taken as long as 


(7:3) 


Xa-6i) 2 


1 See, for instance, Section 2.3.2. 
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Inspection is terminated with the acceptance of the lot if 




' 272 S(x «“* o)2 


Inspection is terminated with the rejection of the lot if 


rw* {x «- 6l) 


, 2(x a -e o) 2 


According to Section 3.3 approximate values of A and B are given 
by (1 - 13) /a and 0/(1 - <*), respectively. 

By taking logarithms and simplifying, the inequalities (7:3), (7:4), 
and (7 :5) can be written as 


(7:6) log 


0 0 \ — 0o 


771 

^ ^ x a + — (0q 2 — 6l 2 ) < log 


1 -P 




(7:8) + ^ 6 ■«* — 

<r Za 01 

respectively. 

Further simplification in carrying out the test procedure can be 
achieved by adding (— m/2a 2 ) (do 2 — 0 \ ") to both sides of the inequal- 
ities (7:6), (7:7), and (7:8) and then dividing these inequalities by 
(0i — do) /a 2 . These operations transform the inequalities (7:6), (7:7), 
and (7:8) into 

<r 2 p 0o + 0i 

a 2 1 — P 0o + 0i 

/ X a < — log 1- m — - — 

0i-0o « 2 

<T 2 P 00+01 

(7:10) 


l -P 


(7:11) 

respectively. 


a 2 1 — P 0o + 0i 

Sx a ^ log h m — - — 

0i - 0 O a 2 
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By using the inequalities (7:9), (7:10), and (7:11) the inspection 
plan may be carried out as follows. For each m compute the accept- 
ance number 


(7:12) 


O ’ 2 


tom 


01 — 00 


log 


1 ~ 


+ m - 


+ 0i 


and the rejection number 


(7:13) 


Tm — 


01 ” 0o 


1 — P 0o + 0i 
log b m — - — 


These acceptance and rejection numbers are best computed before in- 
spection starts. Inspection is continued as long as a m < < r m . 

At the first time when Xx a does not lie between a m and r w , inspection 
is terminated. The lot is accepted if hx a g a m , and the lot is rejected 
if ^ r m . 

As an illustration, consider the following example. Let 0 O = 135, 
61 = 150, a = .01, and 0 = .03. Furthermore, let a = 25. The ob- 
servations and the acceptance and rejection numbers are tabulated in 
Table 7, which shows that the sampling inspection is terminated at 
ra = 20 with the acceptance of the lot. 

The test procedure can also be carried out graphically as shown in 
Fig. 15. The number m of observations is measured along the hori- 



zontal axis. /The points (m, a m ) will lie on a straight line Lq and the 
points (m, jL) will lie on a parallel line L\. We draw the parallel lines 

L 0 and L\ before inspection starts. The points (ra, ^ \ ) are plotted 

. \ . a ~ l 
as mspection goes on. Inspection is continued as long as the plotted 

points (ra, Sx a ) lie between the lines L 0 and L\. Inspection is termi- 
nated at the first ttraejvhen the point (ra, 2x a ) does not lie between 
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m 

Number of 
Observations 

Q>m 

Acceptance 

Number 

X 

Observed 

Value 

2s 

Cumulated 
Sum of 
Observed 
Values 

I'm 

Rejection 

Number 

1 


151 

151 

334 

2 

139 

144 

295 

476 

3 , 

281 

121 

416 

619 

4 

424 

137 

553 

761 

5 

566 

138 

691 

904 

6 

709 

136 

827 

1046 

7 

851 

155 

982 

1189 

8 

994 

160 

1142 

1331 

9 

1136 

144 

1286 

1474 

10 

1279 

145 

1431 

1616 

11 

1421 

130 

1561 

1759 

12 

1564 

120 

1681 

1901 

13 

1706 

104 

1785 

2044 

14 

1849 

140 

1925 

2186 

15 

1991 

125 

2050 

2329 

16 

2134 

106 

2156 

2471 

17 

2276 

145 

2301 

2614 

18 

2419 

123 

2424 

2756 

19 

2561 

138 

2562 

2899 

20 

2704 

108 

2670 

3041 

21 

2846 



3184 

22 

2989 



3326 

23 

3131 



3469 

24 

3274 



3611 

25 

3416 



3754 


L 0 and L\. If it lies on L 0 or below the lot is accepted, and if it lies 
on Li or above the lot is rejected. 

The common slope of the lines Lq and L\ is given by 


( 7 : 14 ) 


#o + 0i 
s = 

2 


The intercept of L 0 is equal to 


( 7 : 15 ) 


cr 2 

0i — 0o 


log 


/ s 

1 — a 


and the intercept of L\ is given by 


( 7 : 16 ) 


h = 


cr 


2 


6\ — 6q 


log 


1 -fi 

a 
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7.4 The Operating Characteristic (OC) Curve of the Test 

Let L{6) denote the probability that the sequential test will lead to 
the acceptance of the lot when 0 is the true mean value. The function 
L{B) is called the operating characteristic function of the test. Ap- 
proximate formulas for the OC function are derived in Section 3.4 and 
the general results are applied to testing the mean of a normal popu- 
lation. [See equation (3:48).] It is shown there that 


(7:17) 


where 

(7:18) 



0i + 0o — 20 
0i — 0o 


It can be seen from (7:17) and (7:18) that L(0) is an increasing func- 
tion of h and h is a decreasing function of 0. Hence L(0) is a decreas- 
ing function of 0. 

For 0 = — oo, 0 O , (0 O + 0i)/2, 0i, +°o the values of L(0) obtained 
from (7:17) are given as follows. 2 


(7:19) 


L(— oo ) = 1; L(0 O ) = l — a 



log 


1 -fl 

a 


log 


1 -P 

a 


log 


£ 

l — a 


L(9 1 ) - p 


L( oo ) = 0 


The computation of these five points of the OC curve will suffice in 
many applications. 

It may be of interest to express L(0) in terms of the intercepts h Q 


2 For 0 = &1 - — we have h = 0 and the limiting value of the right-hand member 


log 


1 


of (7.17) as h — ► 0 is equal to 


1-0 0 
log — — log 


1 - a 
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and h\ and the common slope $ of the lines L 0 and L\. z From (7:17) 
and (7:18) it follows that 

0i +0o— 20 . 1 — fi 

e ».-*o log “^_ x 

(7.20) L(d) ~ Sl+9o _2s »i+#o-2 F T~ 

' log "TT _ e 9,-80 log I ^ 

Q. t ^ ^ ^ , 1 ~ £ j ^1 + ®0 

oince /i 0 = log , hi = log and s = , 

9i- 6 o 1 - a di - do a 2 

we obtain from (7 :20) 

§i <•-*)*! 

(7:21) L(0) ^ = 

-« (a — 0)Ai -s («-0)Ao 

g* 2 _ e * 2 

7.5 The Average Amount of Inspection Required by the Test 

In Section 3.5 the following approximation formula is derived for 
the expected value Eo(ri) of the number n of observations required by 
the sampling plan. 

Lie) log + [i - L(e)} log 

(7:22) E e (n) — — 

E e {z) 

where 

,, „ . 

(7:23) 2 = logf^ = log e —4 

/(Mo) -£7 2 (*-*o) 2 


~ 2^2 P(^i — 0o)x + O 0 2 — 9i 2 ] 

and Ee(z) denotes the expected value of z when 6 is the true mean of x. 
The value of E e (z) is given in Section 3.5, equation (3:60). 


Ed(z) — — [2(0i — 0 O )0 + 0o 2 — 0i 2 ] 


Hence 


(7:25) E e (n) = 2 <r 


L (e) log + [i - L(e)} log - — 

l — a a 


0o 2 - #i 2 + 2(0i — 0 O )0 

hi + L(6)(ho — hi) 


8 See also SRG 255, p. 4.19. 
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where h 0 and hi are the intercepts and s is the common slope of the 
lines L 0 and L\. 

For 6 = s, the right-hand member of (7 :25) takes the form 0/0. It 
is shown in the Appendix, equation (A:99), that the limiting value is 
given by 


(7:26) 


EM) = 


- log 


0 


1 — a 


log 


1-0 

a 


E a (z 2 ) 


Since E a (z ) = 0, E„(z 2 ) is equal to the variance <r 2 of z. From (7:23) 
it follows that the variance of z is equal to (S\ — 6q) 2 /<j 2 . Hence 


(7:27) 


log 


E a (n) = 


1 - 


■log 


1 -0 


(fii — d 0 ) 2 




Chapter 8. TESTING THAT THE STANDARD DEVIATION OF 
A NORMAL DISTRIBUTION DOES NOT EXCEED A GIVEN 

VALUE 

8.1 Formulation of the Problem 

Let x be a normally distributed variate. In this section we shall 
deal with the problem of testing the hypothesis that the standard 
deviation o of x does not exceed a given value o f . There are two cases 
to be considered: the mean of x is known or unknown. First we shall 
treat the case when the mean of x is known. If the mean of x is un- 
known, only a slight modification of the test procedure will be neces- 
sary, as will be seen later. 

This problem, like the one treated in Section 7, arises frequently in 
quality control and acceptance inspection. Suppose that x is some 
measurable quality characteristic of a manufactured product and that 
x is normally distributed in the population of units produced. Sup- 
pose, furthermore, that the quality of the product is considered the 
better the smaller the standard deviation a. Thus, there will be, in 
general, a value a' such that the product is considered substandard if 
a > </ and the product is considered satisfactory (meets specification) 
if a ^ </. Since o is unknown, the problem is to devise a sampling 
plan for testing the hypothesis that the product is satisfactory, i.e., 
that o ^ o'. 

8.2 Tolerated Risks for Making a Wrong Decision 

If the quality of the product is exactly on the margin, i.e., if c r = <r', 
it will make no difference whether the product is classified as satis- 
factory or as substandard. However, if o is considerably smaller than 
a ', the classification of the product as substandard will usually be 
regarded as an error of practical importance. Similarly, if <7 is much 
larger than <r', the classification of the product as satisfactory will be 
a serious error. Thus, it will be possible to specify two values o 0 and 
d (< 7 0 < o' and <t\ > o') such that the classification of the product as 
substandard is considered an error of practical importance whenever 
a ^ (To, and the classification of the product as satisfactory is regarded 
as an error of practical consequence whenever <r ^ o \ ; for values a be- 
tween <tq and o\ we do not care particularly which action is taken. 
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In accordance with the considerations in Section 2.3.2, the risks that 
we are willing to tolerate may reasonably be stated as follows: The 
probability of classifying the product as substandard should not exceed 
a small preassigned value a whenever a ^ ao, and the probability of 
classifying the product as satisfactory should not exceed a preassigned 
value p whenever <7 ^ <ti. 


8.3 The Sequential Probability Ratio Test Corresponding to the 
Quantities <r 0 , <ri, a, and p Q&d * — 1 

A sampling plan satisfying the requirements regarding the tolerated 
risks is given by the sequential probability ratio test of strength (a, p) 
for testing the hypothesis that a = cr 0 against the alternative that 
<7 = <T\. 

Let Xi, x 2 , ••*, etc., denote the successive observations on x . The 
probability density of the sample (aq, • • *, x m ) is given by 


( 8 : 1 ) 


Pm 


-h 2 ( *“- 9)2 

■ e a=1 


(2? r) 5 


where the value of the mean 0 is assumed to be known. Let Pi m de- 
note the expression we obtain if a is replaced by Oi (i = 0, 1) in the 
right-hand member of (8:1). The sequential probability ratio test is 
given as follows. The probability ratio V\ m /Pom is computed at each 
stage of the experiment. Additional observations are taken as long as 1 


( 8 : 2 ) 


1 - 2 ^" 2 S < X «“^ 2 

e a=l 

P < Plm _ ^ 


< 


1 -P 


1 “ P ° m 1 - 2^5 2 “ 


CO 


The product is classified as satisfactory if 


(8:3) 


1 - 2^5 2 ^-«) 2 
“- 1 

^1 

1 m 

l 'w2 (, “'' )l 

e a-l 

_ m 

<ro 


1 - 


1 There is a slight approximation involved in the formulas given below, since 
the constants A and B are put equal to (1 — p)/a and 0/(1 — a) respectively. 
In this connection see Section 3.3. 
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The product is classified as substandard if 


(8:4) 


1 -27T* 2 «*--«* 

e a= 1 

oV" .1-/3 


1 


a 


00 


Taking logarithms, dividing by (l/2cr 0 2 ) — {l/2c\ 2 ) and simplifying, 
the inequalities (8:2), (8:3), and (8:4) will become 


(8:5) 


ft (Tl 

2 log b m log 

l — a 


2 m 


r~ i — ^ - 6)2 < 


2 „2 

(To (Tl 


1 - 0 cr | 

2 log b m log — r 

a OV 

~~ 1 1 

2 2 
(T 0 01 


( 8 : 6 ) 


and 


(8:7) 


m 

- e ) 2 ^ 


P <Tl 

2 log b m log — - 

1 — a (Tq 


1 1 

~~2 2 

00 


m 21 ° g “ 

y^(z a - ®) 2 ^ — 


Cl 

+ m log — 
<r 0 2 


1 1 

72 72 

00 01 


respectively. 

On the basis of the inequalities (8:5), (8:6), and (8:7), the test pro- 
cedure can be carried out as follows: For each integral value m com- 
pute the acceptance number 

2 


( 8 : 8 ) 


P 0 f 

2 log log — 

1 — a <tq 

a m = — : b m ■ 


1 J_ 

2 2 
00 01 


1 1 

2 2 
00 01 
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and the rejection number 

01 1-P °i 2 

2 log log — 

a Co 

(8:9) r m = — — + m - 

C 0 2 Cl 2 C 0 2 Cl 2 

These acceptance and rejection numbers do not depend on the 
outcome of the observations and, therefore, they can be computed 
before inspection starts. Inspection is continued as long as a m < 

m 

— O) 2 < r m . The first time that ~2(x a — 8) 2 does not lie be- 

a= 1 

tween a m and r m , inspection is terminated. If at the final stage 

m 

— 0) 2 ^ a m the product is declared satisfactory, and if 

a = 1 
m 

(x a — 0) 2 ^ r m the product is declared substandard. 

a»l 

A graphical presentation of the test procedure is shown in Fig. 16. 


m 

Fig. 16 

The number m of observations is measured along the horizontal axis. 
Since both a m and r m are linear functions of m, the points (ra, a w ) will 
lie on a straight line L 0 and the points (m, r m ) will lie on a straight 
line Li. These two lines are parallel and the common slope is given by 



1 1 


2 2 
(TO <Tl 




( 8 : 10 ) 
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The intercept of L 0 is equal to 


and the intercept of L x is given by 


1-/5 


The lines L 0 and L x can be drawn before inspection starts. As inspec- 

m 

tion goes on the points [m, 2>“ — 0) 2 ] are plotted. The first time 

a = 1 

that the point [ra, H{x a — 0) 2 ] does not lie between the lines L 0 and 
L Xy inspection is terminated. If the point [m, H(x a — d) 2 ] lies on L 0 
or below, the hypothesis that the product is satisfactory is accepted; 
and if the point [ra, 2(x a — 0) 2 ] lies on L x or above, the product is 
declared substandard. 

8.4 The Operating Characteristic (OC) Function of the Test 

For any value cr, let L(p) denote the probability that the test will 
terminate with the acceptance of the hypothesis that the product is 
satisfactory. The function L(<j) is called the operating characteristic 
function of the test. 

In Section 3.4 a general method is given for deriving an approxima- 
tion formula for the OC function for any sequential probability ratio 
test. Applying the result of that section, we obtain 


(8:13) 


Lie 7) = 


gjg: ; 


where h is the root of the equation 

( } '\Z2tg <j\ h J - » ( 




dx = 1 
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It can be seen that the integral on the left side of (8:14) has a finite 
value only if ( h/o\ 2 ) — (h/tr o 2 ) + (l/<r 2 ) > 0. In this case, as can be 
verified, we have 


2ai2 


(x-0)V 


- 5 — 3 ( X ~ 0 )‘ 

e 2a ° 




Jl _ A i I 
2 2 ’’ 2 
O'! <r 0 a 


Hence equation (8:14) can be written as 


(8:16) 




' ~2 + ~2 
ctq 


Instead of solving (8:16) with respect to /fc, we shall solve it with re- 
spect to a. We obtain 


_ 2 2 
ci <r 0 


With the use of equations (8:13) and (8:17), the OC curve can be 
plotted as follows. For any given value of h we compute <r and L(a) 
from equations (8:13) and (8:17). The pair [c, L(a)\ obtained in this 
way gives us a point on the OC curve. Computing [a, L(o)] for a 
sufficiently large number of values of h, we obtain enough points to 
draw the OC curve. 

For computational purposes, it may be convenient to put 2 


(8:18) 


2cro 2 


= t or h — 


(i — L) 

W »,V 


Then equations (8:13) and (8:17) can be written as 


(8:19) L{<j) = 


(*'-?) (±T i) 

q \<ro 2 <ri 2 / — I 

O-^Xrri:) ('•*t^)(xtx) 

> Vffo 2 ffi 2 / — p. \^o, 2 <^i 2 / 


_ 1 

e -th x g— thQ 

2 A similar simplification was made by the Statistical Research Group. See 
SRG 255, p. 6.31. The parameter t used there corresponds to — t here. 




where s is the common slope and ho and hi are the intercepts of the 
lines Lo and L r . Equations (8:19) and (8:20) may be more convenient 
for the computation of the OC curve than the original equations (8:13) 
and (8:17). 

For <j = 0, <? 0 ) <*i) + 00 the values of L{a) are given as follows: 

(8:21) L( 0) = 1 

L((T 0 ) = 1—0! 

hi 

hi — ho 

L(<r i) = P 
L( oo ) = 0 

These five points already determine roughly the shape of the OC curve 
and in many instances it will not be necessary to compute further 
points. 

8.5 The Average Amount of Inspection Required by the Test 

According to the results in Section 3.5, an approximation formula 
for the expected value E a (n) of the number n of observations required 
by the sampling plan is given by 



( 8 : 22 ) 

where 

(8:23) 


p 1 -p 

L(a) log b [1 — L(a)] log 

l—o! a 



-~e 2<X1 

, eri ,*0,1/1 1 \ , 

2 = 1 = *°gr + o 1^2 ” rs) _ 

1 - 5— 0 («- 0) 2 <?i ^ Vo ^1 / 


-e 2<r » a 
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and E„{z) denotes the expected value of z when a is the standard devi- 
ation of x. We have 

1/1 1 \ , *o 

(8:24) EM =-{—--^)E(x-d) 2 + log - 

2 \oo oi / 01 

Hence, substituting the right-hand member of (8:24) for E„(z) in 
(8 :22) we obtain 3 


(8:25) E a (n) = 


T 0 1-/31 1-/8 

L(<j) log log 4- log 

1 — a a J a 


it?-?)' 


\0O 01 

L(a)(h 0 — hi) + hi 


2 + log — 

01 


v 2 - 


For a = Vs the expected value of z is equal to 0 and the right-hand 
member of (8:25) takes the form 0/0. According to equation (A:99) 
in the Appendix, the limiting value is given by 


(8:26) 


EM) 


~ log 


1 — a 


log 


1 -P 

a 


E v & 2 ) 


Since E v - 8 (z) = 0, E^(z 2 ) is equal to the variance of z when a = 
-y/s. It follows easily from (8:23) that this variance is equal to 

i \ 2 

— - 9 ) s 2 . Hence 
01 V 


2 Vo-Q 2 


(8:27) 



—h 0 h i 
2s 2 


3 The expression of E 0 {n) in terms of the slope and intercepts of the decision 
lines is contained also in SRG 255, p. 6.34. 
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8.6 Modification of the Test Procedure When the Population Mean 
Is Not Known 4 

If the mean d of x is not known, the following two modifications of 

m 

the test procedure are to be made: (1) replace x a — 8) 2 by 

a*» 1 
m 

— x) where x — (x\ H f* x m )/m; (2) the acceptance num- 

a = 1 

ber a m is replaced by a m -i and the rejection number r m is replaced 
by i. Thus, if the mean is unknown, the acceptance and rejection 
numbers at the rath trial are equal to the acceptance and rejection 
numbers corresponding to the (ra — l)th trial when the mean is known. 

The formula for the OC curve remains unchanged and the expected 
value of the number of observations required by the test is larger by 1 
when the mean is unknown than when the mean is known. 


4 The result contained in this section was found by C. Stein and M. A. Girshick, 
independently of each other. The proof is based on a transformation of the observa- 
tions which reduces this case to the case when the mean is known. See Girshick ’s 
paper, “Contribution to the Theory of Sequential Analysis/' The Annals of Mathe- 
matical Statistics , June, 1946. 



Chapter 9. TESTING THAT THE MEAN OF A NORMAL DIS- 
TRIBUTION WITH KNOWN VARIANCE IS EQUAL TO A 
SPECIFIED VALUE 


9.1 Formulation of the Problem 


Let a; be a quality characteristic of a product, such as weight, diam- 
eter, or hardness. Suppose that x is normally distributed in the popu- 
lation of all units produced and that the standard deviation <r of x is 
known but the mean d of x is unknown. Suppose, furthermore, that 
a particular value of d, say do, is considered the most desirable value 
for the product. In general, the greater the absolute deviation of the 
true value 6 from the most desirable value d 0 , the less satisfactory the 
product. Since the manufacturer would like to achieve and maintain 
the value 0 O of 0 as closely as possible, he will be interested in testing 
the hypothesis that d = 6 0 . If the evidence supplied by a sample 
should indicate that d ^ do, he will try to improve the production proc- 
ess. Of course, if d ^ d 0 but is near d 0 , there is no particular need to 
improve the production, and acceptance of the hypothesis that d = d 0 
would not be a serious error. However, there will be, in general, a 
positive value 5 such that the acceptance of the hypothesis that d = do 


is regarded as an error of practical importance whenever 


0 -d 0 
a 


^ 5 . 


The situation described in the preceding paragraph will thus lead 
to the following problem: A sampling plan is to be devised for which 
the probability that the hypothesis that d = d 0 will be rejected (the 
product will be declared substandard) docs not exceed a small pre- 
assigned value a when d = d 0f and the probability of accepting the 
hypothesis that d = d 0 (declaring the product satisfactory) does not 


exceed a small preassigned value 0 whenever 



^ 5. 


9.2 A Sequential Sampling Plan Satisfying the Imposed Require- 
ments 

It has been shown in Section 4.1.4 that an adequate sampling plan 
for the problem described in Section 9.1 is given as follows. Compute 
the ratio 
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— 27* S - 2 J 2 2 (*or-®0+4*)* 

Pi m le ' 0=1 + g a=l 


POm 2 




2cr2 ^ 

e a=1 


at each stage of the experiment. Continue taking observations as long 


B<^<A 


Accept the hypothesis that the product is satisfactory if 


Reject the hypothesis that the product is satisfactory if 


— > A 


To satisfy the requirements imposed regarding the probabilities of 
making wrong decisions, for all practical purposes we may put A = 
(1 — (3)/ a and B = (3/(1 — a). 

The expression for pim/pom given in (9:1) can be simplified to 
(9:5) — = - e~ ^\^ Xa ~ 9o) + e~' S(Xa ~°°) 

Vn rn 2 


_ e ~ y%^‘ 


[ 5 ^ 

- / (x<x - e 0 ) 

a 7^1 J 


Substituting this value of pi m /p 0m in (9:2), (9:3), and (9:4) and taking 
logarithms, we find that these inequalities become 


S 2 [5 1 

(9:6) log B + m — < log cosh - / (x a — 6 0 ) < log A 

2 L<r £-( J 

~ 8 "I ^2 

(9:7) log cosh -2(z a — 6 0 ) I ^ log B + m — 

Lcr J 2 


log cosh 


j^-2(x a - e 0 ) j 


^ log A + m — 
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With the use of inequalities (9:6), (9:7), and (9:8), the test proce- 
dure is carried out as follows. At each stage of the experiment we 


compute Z m = log cosh 





The first time that Z m 


does not lie between log B + [m(8 2 / 2)] and log A + [m(8 2 /2)] we ter- 
minate the process. The hypothesis that 6 = 0 O is accepted if Z m ^ 
log B + [m(8 2 / 2)], and rejected if Z m ^ log A + [m(5 2 /2)]. 

The computation of Z m at each stage of the experiment is somewhat 


cumbersome. 


However, if 



is greater than 3, Z m = 


log cosh 



is very nearly equal to 


-2 (x a - 0 O ) 
a 


log2.* When this approximation to Z m is used, inequalities (9:6), 
(9:7), and (9:8) simplify to 


a go . , 

(9:9) - (log B + log 2) + m — < \ S(ar« - 0 O ) \ < 

8 2 

G g8 

- (log A + log 2) + m — 
8 2 

(9:10) | Sfe - 0 O ) | ^ - (log B + log 2) + m — 

8 *2 

and 

(9:11) | 2(a; - 0 o ) | ^ - (log A + log 2) + m — 

8 c 2 

respectively. For all practical purposes inequalities (9:9), (9:10), and 
(9:11) may be used instead of (9:6), (9:7), and (9:8) whenever 

- I 2(z« - 0 O ) | £ 3. 

G 

The following is an alternative computational procedure which may 
be found useful. Consider the equation in u. 

(9:12) log cosh | u | = v 


This has exactly one positive solution if v ^ 0. The root of this equa- 
tion is given by 

(9:13) | u | = <f>(v ) = log (e” + V e 2v - 1 ) 


1 See also SRG 265, p. B.15. 
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The function <t>(v) can easily be tabulated. In terms of the function 
inequalities (9:6), (9:7), and (9:8) can be written as 


(9:14) 

(9:15) 

and 

(9:16) 


- <t> (log B + m — ^ < | S(x a - 6 0 ) | < \ <\> ( log A + m ■ 


■ (log A + m j) 


2(r a - 6 0 ) | ^-<t> (log B + m ~) 


2(x a - 6 0 ) | ^ ^ (j> (log A + m, — ^ 


When inequalities (9:14), (9:15), and (9:16) are used, 
be carried out as follows. For each integral value m we 
acceptance number 


(9:17) 



/ $ 2 
^log B -j- m — 


) 


the test can 
compute the 


and the rejection number 


(9:18) 



/ S 2 \ 

ylog A + m — ) 


These acceptance and rejection numbers can be computed before ex- 
perimentation starts. Additional observations are taken as long as 
a m < | S(x a — 6 0 ) | < r m . If | S (x a — 6 0 ) | ^ ^ the hypothesis that 
6 = 6 o is accepted and if | 2(x a — 0 O ) | = r m the hypothesis that 6 = 6 0 
is rejected. 



PART III. THE PROBLEM OF MULTI-VALUED DECISIONS 
AND ESTIMATION 


Chapter 10. THE CHOICE OF A HYPOTHESIS FROM A SET 
OF MUTUALLY EXCLUSIVE HYPOTHESES (MULTI-VALUED 

DECISION) 

10.1 Formulation of the Problem 

Part I has been devoted exclusively to the discussion of the problem 
of testing a statistical hypothesis. In such problems only one of two 
possible decisions can be made: the hypothesis is either rejected or 
accepted. Thus, we can say that testing a hypothesis is a two-valued 
decision problem, since the decision can take only the two values: 
acceptance and rejection. Let U denote the negation of the hypothesis 
H to be tested. Then testing the hypothesis H is the same as choosing 
between the two competing hypotheses II and II. 

It has been pointed out in Section 1.3.5 that testing a hypothesis H 
arises frequently as a consequence of the problem of deciding between 
two alternative courses of action, say action 1 and action 2. Suppose 
that the preference for one or the other action depends on the value 
of an unknown parameter 0 of the distribution of a random variable x. 
Let o) denote the set of all values of 0 for which action 1 is preferred to 
action 2 (or at least not less desirable than action 2). If a decision is 
to be made on the basis of a finite number of observations on x , this 
leads to the problem of testing the hypothesis II that the true value 0 
lies in w. If H is accepted, we decide for action 1, and if H is rejected 
we decide for action 2. In applications it happens frequently that there 
are more than two alternative courses of action, one of which is to be 
chosen. Suppose that there are k (k > 2) alternative actions, say 
action 1, action 2, • • •, action k , and that one of them is to be chosen 
on the basis of some observations on the random variable x. Suppose, 
furthermore, that the relative degree of preference for these actions 
depends on the value of a parameter 0 of the distribution of x. Then 
it will be possible, in general, to subdivide the totality of all possible 
values of 0 into k mutually exclusive parts coi, co 2 , • • •, o*k such that 
action j is preferable to all other actions i j if, and only if, the true 
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value 9 lies in coy. Let II j denote the hypothesis that 6 lies in coy (j = 
1, • • •, k). Then the problem of deciding for a particular action re- 
duces to the problem of choosing one of the hypotheses Hi, • • •, H &. 
If Hi is accepted we decide to take action i. Such a problem may be 
called a multi-valued decision problem, since the decision to be made 
can take k values: We may accept H\, or H 2 , • • •, or 

In this section we shall deal with the problem of choosing one out 
of k mutually exclusive and exhaustive hypotheses, II i, •••, //&, on 
the basis of some observations on the random variable x under con- 
sideration. 1 The problem of testing a hypothesis is contained in this 
as a special case when k = 2. 

The following simple example may serve as an illustration. Suppose 
that x is a measurable quality characteristic of a product which is 
normally distributed in the population of units produced. Suppose, 
furthermore, that the quality of the product is regarded the better the 
higher the mean value 9 of x. Assume that the following three alter- 
native actions are under consideration by the manufacturer: (1) to 
sell the product at the regular market price, (2) to label the product as 
second rate quality and sell it at a reduced price, (3) to withhold the 
product from the market. Let a and b (a <b) be two values of 9 such 
that the manufacturer prefers action 3 if 9 ^ a, he prefers action 2 if 
a < 6 < b y and he prefers action 1 if 9 ^ b. Let Hi denote the hy- 
pothesis that 9 ^ a, // 2 the hypothesis that a < 9 <6, and H 3 the 
hypothesis that 9 ^ b. If the value of 9 is unknown and if the manu- 
facturer must decide which action should be taken on the basis of 
some observations on x, he is faced with the multi-valued decision 
problem of choosing one of the mutually exclusive hypotheses H\, H 2 , 
and H 3 . 

10.2 The General Nature of a Sequential Sampling Plan for Select- 
ing a Hypothesis from a Set of Mutually Exclusive Hypotheses 

A sequential sampling plan for choosing one of k mutually exclusive 
and exhaustive hypotheses Hi, • • •, Hk may be described as follows. 
A rule is given for making one of the following ( k + 1) decisions at 
each stage of the experiment (at the mth trial for each integral value 
of m): (1) to terminate experimentation with the acceptance of Hi; 
(2) to terminate experimentation with the acceptance of II 2 ; • • • ; (k) 

1 This problem in the non-sequential case, that is, when the total number of 
observations to be made is determined in advance, has been treated in several 
previous publications. See, for example, the author's article “Statistical Decision 
Functions Which Minimize the Maximum Risk," The Annals of Mathematics , 
April, 1945. 
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to terminate experimentation with the acceptance of H (k + 1) to 
continue the experiment by making an additional observation. Such 
a procedure is carried out sequentially. On the basis of the first ob- 
servation one of the aforementioned ( k + 1) decisions is made. If one 
of the first k decisions is made, the process is terminated. If the last 
decision is made, a second trial is performed. Again, on the basis of 
the first two observations, one of the (k + 1) decisions is made. If 
the last decision is made, a third trial is performed, and so on. The 
process is continued until one of the first k decisions is made. 

In more precise mathematical terms, a sequential sampling plan 
may be described as follows. Let R m denote the totality of all possible 
samples of size m, i.e., R m is the m-dimensional sample space. For 
each positive integral value of m, the m-dimensional sample space is 
split into ( k + 1) mutually exclusive parts, R m lf R m2 , • • •, R m k and 
Rm,k+i m the first observation aq lies in Ru where i ^ k 9 the process 
is terminated with the acceptance of H{. If aq lies in R\ t k-\-i a second 
observation x 2 is made. Again, if (aq, x 2 ) lies in some R 2 i with i ^ k, 
the process is terminated with the acceptance of Hi. If (aq, x 2 ) lies 
in R 2 ^+\ a third trial is performed, and so on. This process is stopped 
at the first time when the sample (aq, • • •, x m ) lies in R m i for some 
value i ^ k. Thus, a sequential sampling plan is completely defined 
by the sets R m i, • • *, R m> k+ 1 - Since these sets are mutually exclusive 
and add up to the whole sample space R m) it is sufficient to define any 
k of these sets, since they determine uniquely the remaining set. 

For any m, the subdivision of the sample space R m into the ( k + 1) 
parts R m i, • • *, can be made in many ways, and a fundamental 

problem is that of a proper choice of these sets. In order to set up 
principles for this choice, in the next section we shall study the con- 
sequences of any particular choice. 

10.3 Consequences of the Choice of Any Particular Sequential Sam- 
pling Plan 

After a particular choice of the sets R m \, • • •, R m ,k+ i has been made, 
i.e., a particular sequential sampling plan has been adopted, for any 
i ^ k the probability that the process will terminate with the accept- 
ance of Hi depends only on the distribution of the random variable x 
under consideration. Since it is assumed that the distribution of x is 
known except for the values of a finite number of parameters 9 U • • *, 
0 r , the probability that Hi will be accepted will be a function of these 
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parameters. To simplify notation, we shall use the letter 0 without 
subscript to denote the set of all r parameters 0i, • • *, 0 r . Let Li(6) 
denote the probability that the adopted sequential sampling plan will 
terminate with the acceptance of Hi (i = 1, • • *, k). We shall refer 
to the set of functions Zq(0), L 2 (0), • • *, Lk ; (0) as the operating charac- 
teristics of the sampling plan. We shall consider only sampling plans 
for which the probability is 1 that the process will eventually termi- 
nate. Then we have 

(10:1) Li (0) -} f- Lk(d) = 1 

and, therefore, one of the functions Zq(0), • • •, L&(0) is determined by 
the other k — 1. 

The operating characteristics represent the accomplishment of the 
sampling plan in giving protection against possible wrong decisions. 
For any parameter point 0, the probability of accepting the correct 
hypothesis, i.e., the hypothesis which is consistent with parameter 
point 0, can be obtained immediately from the operating character- 
istics. Since the hypotheses H i, • • • , are mutually exclusive and 
exhaustive, for any given parameter point 0 one, and only one, of the 
hypotheses H\, • • •, II & will be consistent with a given 0. If Hi is the 
hypothesis consistent with a given 0, the probability of making a cor- 
rect decision when this 0 is true is equal to L t -(0). The operating char- 
acteristics of a sampling plan are considered the more favorable the 
higher the probability for making correct decisions for the various pos- 
sible parameter points 0. 

The price we have to pay for the accomplishment of the sampling 
plan in giving protection against wrong decisions is represented by the 
number n of observations required by the sampling plan. Since n is 
a random variable, we shall consider, as in testing a hypothesis, the 
expected value of n. After a particular sampling plan has been 
adopted, the expected value of n will be a function of the parameter 
point 0 only. As in testing hypotheses, we shall denote the expected 
value of n, when 0 is true, by Eo(ri), and we shall refer to Eq{u) as the 
average sample number (ASN) function of the sampling plan. 

In conclusion we may say that the most important consequences of 
any particular choice of a sampling plan are given by the operating 
characteristics and the ASN function of the adopted sampling plan. 
The operating characteristics represent the accomplishments of the 
sampling plan and the ASN function represents the price paid for these 
accomplishments. 
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10.4 Principles for the Selection of a Sequential Sampling Plan 

10.4.1 Dependence of Importance of Possible Wrong Decisions on 
the Parameter Point 0 

To set up principles for the selection of a sequential sampling plan 
it will be necessary to investigate the dependence of the importance 
of possible wrong decisions on the parameter point. Let a?* denote the 
set of parameter points 0 consistent with Hi (i = 1, • • *, k), i.e., II t is 
precisely the statement that the true parameter point 0 is included in 
(x)i. If the true 0 is in co*- but not far from ooj for some j 9^ i, the accept- 
ance of Hj will not be regarded, in general, as a serious error. How- 
ever, if 0 is far from coy and II j is accepted, the error committed will 
Usually be of considerable practical consequence. 

As an illustration, consider again the example given in Section 10.1. 
The decision to withhold the product from the market will be con- 
sidered an error of little practical significance if 0 is only slightly above 
a. The seriousness of this error will, however, increase with increasing 
value of 0. If 0 is substantially above a, the decision to withhold the 
product will be regarded as an error of considerable practical impor- 
tance. Similarly, the decision to try to sell the product at regular 
market price will not be a serious error if 0 is just slightly below b, 
but the importance of this error will increase with decreasing value 
of 0. 

It will frequently be possible to express the importance of the var- 
ious possible wrong decisions by k functions • • • , Wk(0), where 

Wj{0) is a non-negative function expressing the importance of the error 
committed by accepting H d when 0 is true. In industrial problems, 
Wj(0) may be thought of as expressing the financial loss caused by 
taking the action corresponding to the acceptance of II j when 0 is true. 
We shall, of course, put w 3 (d) = 0 for all points 0 in coy, since for such 
points 0 the acceptance of Hj is a correct decision. We shall refer to 
the functions Wi(6), • • •, Wk(0) as error weight functions, or more briefly 
as weight functions. 

The choice of a sampling plan will be influenced by the weight func- 
tions W\(d) y • • •, Wk(6). The determination of these weight functions 
cannot be regarded as a statistical problem. They will be chosen on 
the basis of practical considerations in each particular problem. 

10.4.2 The Risk Function Associated with a Given Sampling Plan 

For any parameter point 0 we shall mean by the risk r(0) the ex- 
pected value of the loss caused by possible wrong decisions when 0 is 
true. Since the probability of accepting Hi is equal to L t (0) and since 
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the loss caused by this decision is given by w t (0), the expected value 
of the loss is equal to 

(10:2) r(0) = Li( 6 )wi( 6 ) + L 2 (d)w 2 ( 6 ) + • • • + Lk( 6 )iVk( 6 ) 

We shall refer to r(0) as the risk function of the sampling plan. 2 

We shall judge the relative merits of a sampling plan by its risk 
function r(0) and ASN function Eq{u). 

10.4.3 The Risk Function and the ASN Function as a Basis for the 
Selection of a Sequential Sampling Plan 

A sequential sampling plan is the better the smaller the risk r(0) and 
the smaller the expected value E$(n) of the number of observations. 
These two desiderata of a sampling plan are somewhat in conflict, since 
the smaller we make r(0), the larger, in general, will be the number of 
observations required by the plan. To achieve a reasonable compro- 
mise between these two conflicting desiderata, one may proceed as 
follows. First we impose the condition that the risk r(0) shall not 
exceed a certain prescribed positive value r 0 , i.e., 

(10:3) r(0) ^ r 0 

for all parameter points 0. We then consider only sampling plans for 
which (10:3) is fulfilled. From this class of sampling plans we try to 
select one for which E$(n) is as small as possible. 

To impose first the condition (10:3) and then to try to minimize with 
respect to the expected number of observations does not seem to be 
an unreasonable procedure, since the risk function r(0) is perhaps of 
primary importance. 3 

The choice of the upper limit r 0 of the risk is not a statistical prob- 
lem. It will be determined on the basis of practical considerations in 
each particular case. 

2 Another possible definition of the risk function could be given by including also 
the expected value of the cost of experimentation. If c denotes the cost of taking 
a single observation, the expected value of the cost of experimentation is equal to 
cEq{u) and the risk is given by 

(10:2*) r*(0) = ^L x {e)wi{e) + cE d (n) 

i= l 

If the cost of experimentation is not proportional with the number of observations, 
but is given by the cost function c(n), then the term cEg(n) in (10:2*) is to be 
replaced by E Q [c(n)]. 

3 Usin^the risk function r*(0), as given in (10:2*), a sampling plan for which the 
maximui\yulue of r*(0) with respect to 0 is minimized may be regarded as an 
optimunTrta»)/If this definition of an optimum sampling plan is accepted, no 
condition orrhe type (10:3) is imposed; we simply try to find a plan for which the 
maximum of r*(0) with respect to 0 takes the smallest possible value. 
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10.4.4 The Use of Certain Simple Weight Functions 

The construction of specific weight functions Wi(B), • • •, w k (B) in a 
given problem may occasionally run into practical difficulties. Al- 
though in industrial problems Wj(B) could be assumed to be equal to 
the financial loss (or estimated financial loss) caused by the acceptance 
of Hj when 6 is true, in purely scientific investigations it is rather diffi- 
cult to give a reasonable measure of the loss caused by accepting a 
wrong hypothesis. 

Even if the difficulties in measuring the loss caused by possible wrong 
decisions are disregarded, we still face the practical difficulty that the 
weight functions W\(B), • • w k (B) in a given problem may be too in- 
volved to be manageable. Thus, there is a need for simplification. 

The choice of the sampling plan is usually not very dependent on 
the exact shape of the weight functions. It will, therefore, be fre- 
quently satisfactory to use some rough approximations, reproducing 
only the main features of the weight functions. A very rough, but 
for many applications satisfactory, approximation can be obtained by 
replacing wj(6) by Wj(B) defined as follows: 

(10:4) wj(e) = 0 if wj{6) is less than or equal to a certain value Cj 
= c if Wj(6) > Cj 

where c is some positive constant. Thus, Wj(B) can take only two 
values, 0 and c. There is no loss of generality in putting c = 1, since 
this can be achieved by multiplication by a proportionality factor 
which has no effect on the selection of the sampling plan. 

In what follows in this and the following section, we shall consider 
only the weight functions Wj(0 ). We shall call the set of all parameter 
points 6 for which Wi(B) = 0 and w 3 (B) = 1 for j ^ i the zone of pref- 
erence for acceptance of H { . The set of points 6 for which w { {B) = 
Wj(6) = 0 and w k {B) = 1 for k ^ i, j will be called the zone of indiffer- 
ence between H { and Hj. Similarly, the set of points 6 for which 
Wi(6) = Wj{6) = w m (B) = 0 and wi{Q) = 1 for l ^ i,j,m will be called 
the zone of indifference among the hypotheses Hi, Hj, and H m , and 
so on. 

If we deal with the problem of testing a Jiypothesis H, then k = 2, 
Hi = H, and H 2 is equal to the negation H of H . The zone of pref- 
erence for acceptance of H, the zone of preference for acceptance of H, 
and the zone of indifference between H and R defined here correspond 
to the zone of preference for acceptance, zone of preference for rejec- 
tion, and zone of indifference discussed in Section 2.3.1. 

To illustrate the meaning of the various zones defined here, we con- 
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sider again the example discussed in Section 10.1. In this example Hi 
is the hypothesis that 9 S a, H 2 is the hypothesis that a < 0 <6, and 
H 3 is the hypothesis that 0 ^ 6. The functions W\ (0), w 2 (d), and $ 3 ( 0 ) 
may reasonably be defined as follows: 

u>i{9) =0 for 0 < a + A 

= 1 for 0 ^ a + A where A is a certain positive quantity 

w 2 (9) = 0 if a — A < 0 < 6 + A and = 1 elsewhere 

Ws(0) = 0 if 0 ^ b — A and = 1 elsewhere 

Then the zone of preference for acceptance of Hi is the set of values 
of 0 for which 6 ^ a — A. The zone of preference for acceptance of 
H 2 is given by the inequality a + A^0<6 — A, and the zone of 
preference for acceptance of H 3 by 0 ^ b + A. The zone of indiffer- 
ence between Hi and H 2 is given by the inequality a — A<0<a + 
A, the zone of indifference between Hi and H s is empty, and the zone 
of indifference between H 2 and H 3 is given by b — A ^ 0 < 5 + A. 
Finally, the zone of indifference among Hi, H 2 , and # 3 is empty. 

When the weight functions Wi(9), • • •, %(0) are used, the risk func- 
tion r(0) defined in (10:2) takes a particularly simple form. Since 
Wj{9) can take only the values 0 and 1, we shall have 

(10:5) r(e)=J^Lj(6) 

J 

where the summation is to be taken for all values of j for which 
vbj{9) = 1. 

We shall say that a wrong decision is made if, and only if, a hypoth- 
esis Hi is accepted for which W{(9) = 1. Then the risk r(0) given in 
(10:5) is simply equal to the probability that a wrong decision will be 
made. 

The principle for the selection of a sequential sampling plan, as 
stated in Section 10.4.3, can now be formulated as follows. We con- 
sider only sequential sampling plans for which the probability of mak- 
ing a wrong decision does not exceed a certain preassigned value r 0 . 
From the class of such sequential sampling plans we try to select one 
for which the expected value of the number of observations required 
by the plan is as small as possible. 

10.5 Discussion of a Special Class of Sequential Sampling Plans 

The problem of finding a sequential sampling plan which may be 
regarded as an optimum plan in the sense of the previous section is 
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not yet solved. However, as will be shown in this section, a wide class 
of sequential sampling plans can be constructed for which the condi- 
tion that the probability of making a wrong decision should not exceed 
a preassigned value r 0 is fulfilled. 

To construct such a class of sampling plans we shall make use of 
the following lemma. 

Lemma . Let x\, x 2 , • • •, etc. y be a sequence of variates , let p\ m {xi, **, 
Xm) (m = 1, 2, • • •) denote the joint probability density Junction of x\, 
• • - , x m under the hypothesis Hi, and let pom(xi, * • *, x m ) be the den- 
sity function under the hypothesis // 0 . 4 Let , furthermore , A be a con- 
stant greater than one . Then, under the hypothesis Hq , tf/ie probability 


that 

( 10 : 6 ) 


Plm(Xi, • • •, Xm) 

< a 

p0m(Xl) ' ) Xm) 


will hold for all values of m is greater than or equal to 1 — (1/A). 

The validity of this lemma can easily be shown with the help of the 
inequalities given in Section 3.2 by letting the constant B in those in- 
equalities approach 0. 

With the help of this lemma we can construct a sequential sampling 
plan satisfying the condition that the probability of making a wrong 
decision does not exceed a prescribed value r 0 as follows. Let 
p m (x 1 , "-,x m ,6) be equal to f(x u 0)f(x 2 , 6) •• • f(x m , 0) where fix, 6) is 
the probability distribution of x when 6 is true. For any parameter 
point e let p m *(xi, •••,x m ,0) be an arbitrary but given probability 
distribution of the variates x h x 2 , • • •, x m . 5 Then according to our 
lemma the probability that 


(10:7) 


Pm*(xi, * * X m , 0) 

< a 

Pm(x i, ’ , X m , 6) 


will hold for all m is greater than or equal to 1 — (1/A) when 0 is true. 
For any sample point E n = (x\, • • •, x n ), let w n (£' n ) denote the totality 
of all parameter points 6 for which the inequality (10:7) is fulfilled for 
all values m ^ n. Clearly, the probability that the true parameter 
point 6 will be included in all sets u n (E n ) (ft = 1, 2, • • ad inf.) is 
greater than or equal to 1 — (1/A). The sequential sampling plan is 
then defined as follows: We continue taking additional observations 
as long as none of the weight functions W\{6), • • *, Wk(0) is identically 
zero in c o n (E n ). At the first time when a> n (E n ) is such that at least one 


4 If the distribution of x\, X2, • • • , etc. is discrete, pi m (x i, • • • , x m ) denotes the 
probability of obtaining a sample equal to the observed. 

8 It is understood that the distribution of x\, • • • , x m determined from the dis- 
tribution l, • • *, Xm f , 0) {m! > m) is identical with p m *(x i, • • •, x m , 6). 
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of the weight functions Wi(0), * • *, Wk{Q) is identically 0 in c o n (E n ), we 
stop the process with the acceptance of the hypothesis corresponding 
to the weight function which is identically zero in c o n (E n ). 6 Obviously, 
this sequential sampling plan will have the property that the prob- 
ability of making a wrong decision does not exceed 1/A. If we let 
A equal 1 /r 0 , then the probability of making a wrong decision will not 
exceed r 0 , as required. 0% 

This method leads to a wi3o class C of sequential sampling 
plans with the required property, since the distribution function 
p m *(x i, • • x m , 6) in the numerator of (10:7) can be chosen entirely 
arbitrarily. It is doubtful whether this class C of sampling plans con- 
tains an optimum plan in the sense of the definition given in 10.4. 
If we are willing to restrict ourselves to sampling plans in class C, we 
still have the problem of so choosing p m *(x i, • • *, x m , 0) as to make the 
expected number of observations required by the plan as small as pos- 
sible. This problem, too, has not yet been solved. There may be some 
waste involved in letting A = l/r 0 , since this may result in a maximum 
probability of making a wrong decision that is considerably less than 
the tolerated value r 0 . A further development of the theory may show 
that A can be put equal to some value smaller than l/r 0 which would 
lead to a saving in the number of observations. 

Although the present stage of the theory is very incomplete, sampling 
plans based on the inequality (10:7) may still be used with good advan- 
tage in some problems. Even if we cannot yet find the best distribu- 
tion p m *(x i, to be used in the numerator of (10:7), we still 

may be able to make a reasonably good choice of p m *(xi, • • •» x my 6) 
and thereby obtain a sequential plan which requires, on the average, 
a substantially smaller number of observations than the best possible 
non-sequential sampling plan based on a predetermined number of 
observations. 

Regarding possible choices of p m *(x u •••,x m , 6) which may give 
reasonably good results, the following remarks may be made. A good 
result may be obtained in some problems by letting p m *(x i, • • • , x m , 0) 
equal a properly chosen weighted average of p m (x i, • • *, x my f) where 
f is a variable parameter point. In other words, we let 7 


(10:8) Pm*(x i, *, x my 0) I p^(r)Pw(*^i> x my f) d£ 

Jn 

6 If there are several weight functions which are identically 0 in «„(2? n ), we may 
choose arbitrarily one from among the hypotheses corresponding to these weight 
functions. 

7 The averaging function p d ($) may also be discrete. Formulas valid for both 
continuous and discrete averaging functions could be given by using Stieltje’s 
integrals. 
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where the integration is taken over the whole parameter space 12 and 
p # (f) is a non-negative function of f satisfying the condition 

(10:9) 1 

«/ Q 

The choice of the averaging function p 0 (f) will depend on the weight 
functions Wi(0), • • *, w*;(0)- If, for example, w 3 (0) = 0 for the param- 
eter point 0 under consideration, it seems reasonable to let p 0 (f) = 0 
for all parameter points f for which %(f) = 0, since we are not inter- 
ested in discriminating between parameter points for which the same 
decision is correct. 

The following is another possible choice of p m *( x i> * ' ’> x m, 0) which 
may lead to good results in some problems: 

(10:10) Vm*( x U * * *> x m 0) = Q&U 0)/( x 2, 0l)/(^3, 02) * * * f( x mj 1) 

where 0 r is the maximum likelihood estimate of 0 based on the first r 
observations #i, ••*,x r and <t>(xi,6) is some suitably chosen prob- 
ability distribution of x\. 

To illustrate the sampling procedure based on (10:7), we shall con- 
sider the following simple example. Let x be normally distributed 
with unknown mean 6 and unit variance. Then 

m 

l 2 (.Xa-d) 2 

(10:11) Pm( X If * * * , x my 0) m e a 1 

(2tt) 2 

Let 

(10:12) pi*(z u • • • > x mj 0) 

== \[p m {x\y * * *, 0 + 5) + Pm(x l, • • 6 — 5)] 


where 5 is a given positive quantity. Then 

Pm W) • • •» 3ro> 9 ) _ f j e J2(x a -«) _|_ e -S2(x a -9)] 

Pm(%l) ' ' ' ) ^ 

= e - Hm5J cosh[5S(x a -0)] 

The equation 

(10:14) cosh u = v (v > 1) 

has two roots in u which are equal in absolute value. Let \p(v) be the 
positive, and — rf/(v) the negative root of (10:14). Then the roots of 
the equation in 6 

(10:15) e _Hm82 cosh [52 (x a — 6)] = A 
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are given by 


( 10 : 16 ) 


and 


m<5* 


#1 {E m ) = X m + 


yfr(e 2 A) 
mb 


02 C^m) 


= X m - 


x P(e 2 4 ) 
mb 


where is the arithmetic mean of the observations aq, • • •, x m . 
The set of all values of 0 for which the inequality 


%my 0 ) 

Pm(% 1 ; * * *> 0 ) 

is satisfied is the open interval (0 2 (H m ), Oi(E m )). The set w n (H n ) is 
defined as the common part of the open intervals (0 2 (iq), dx(Ei)), •••, 
(0 2 (En) , $i (En ) ). Hence u n (E n ) is equal to the open interval whose 
lower endpoint is equal to the maximum of the values 0 2 (Hi), 
0 2 (E n ), and whose upper endpoint is equal to the minimum of the 
values 0i(#i), • • •, 0i ( E n ). s Experimentation is terminated the first 
time the open interval o) n (E n ) is such that one of the weight functions 
w\{6) y •••, w k (d ) is identically zero in oo n (E n ). 

As another illustration, consider again the example given in Section 
10.1, and for simplicity assume that the standard deviation of x is 
equal to 1. Although the proper choice of p m *(xi y • • •, x my 0) for this 
example has not been thoroughly investigated, the following choice of 
Pm*(x i, • • *, x m , 0) is perhaps not unreasonable. A parameter point 0 
in the zone of preference for acceptance of Hi, i.e., a value 0 ^ a — A, 9 
should be discriminated against all other parameter values f for which 
acceptance of Hi is a wrong decision. The smallest value f for which 
acceptance of Hi is a wrong decision, i.e., the smallest f for which 
= 1, is f = a + A. Thus, we put 


(10:17) Pm*(%l> * ' *> 0) — Pm(% ly ' * ') %iny & “f* A) 


for all 0 ^ a — A 


If 0 is in the zone of indifference between H i and H 2 , i.e., if a — A < 
0 < a + A, we want to discriminate 0 against values {* for which ac- 

8 If it happens that the upper endpoint determined in this way is less than the 
lower endpoint, the set (o n (E n ) is empty. 

•For a definition of the various zones and weight functions uh(0), ^ 2 ( 0 ), and 
ifl 3 (0) for this example see Section 10.4.4. 
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ceptance of Hi, as well as of H 2 , is a wrong decision. The smallest 
value of this kind is £ = b + A. Thus, we let 

(10:18) Pm*(x 1 , • • •, x m , 0) = Vm{xi, • • • , x m , b + A) 

if a — A < 0 < a + A 

If 0 is in the zone of preference for acceptance of H 2 , i.e., if a + A 
^ 0 < b — A, we want to discriminate it against values £ for which 
acceptance of H 2 is wrong. The greatest value £ of this kind to the 
left of a + A is £ = a — A, and the smallest £ of this kind to the right 
of b — A is £ = b + A. It seems, therefore, reasonable to let 

(10:19) Pm* “ & A) -f- p m (x i, ' * *, ^ 4“ ^)] 

ifa+A^0<6 — A 

If 0 is in the zone of indifference between H 2 and i.e., if 
6 — A^0<6 + A, we want to discriminate 6 against values f for 
which the acceptance of II 2) as well as of // 3 , is wrong. Thus, we let 

(10:20) p m *(x i, '"yXm, 0) = p m (x i, • • • , x m , a — A) 

if?>-A^0<b + A 

Finally, if 6 is in the zone of preference for acceptance of i/ 3 , i.e., if 
0^6 + A, we want to discriminate 0 against values £ for which the 
acceptance of JT 3 is wrong. The least upper bound of values of £ of 
this kind is £ = b — A. Thus, we shall let 

(10:21) Pn*{x 1 , • • •, Xm, 0) = Pm(x 1 , * * * , #m, 6 “ A) 

for 0 ^ 6 + A 

It should be remembered that there is no systematic theory yet 
available for the proper choice of p m *(x i, • • •, x m , 0). The choice of 
Pm*(x\, • • •, £ m , 0) in the above example has been made only on intui- 
tive grounds. It may well be that another choice of p m *(x i, • • •, x m , 0) 
exists which leads to much better results. It should also be remarked 
that it is doubtful whether an optimum sampling plan, as defined in the 
preceding section, is a member of the class of sampling plans based on 
the inequality (10:7). Further investigations are needed to clarify 
these questions. 
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11.1 Principles of the Current Theory of Estimation by Intervals or 
Sets 

In this section we shall give a brief outline of the basic ideas of 
estimation by intervals or sets as developed by J. Neyman . 1 Consider 
first the case in which the distribution of the random variable x under 
consideration is known except for the value of a single parameter 9. 
The problem treated in the current theory is that of estimating the 
value of 9 on the basis of a fixed number of observations, say N obser- 
vations Xi, • • *, xn on x. 

Let E denote the sample (x\, ••• 9 xn) and let 6(E) and 9(E) be two 
single-valued functions of the sample E such that 

(11:1) 9(E) g 9(E) for all possible samples E 

Let 8(E) denote the interval extending from 9(E) to 9(E). We shall 
refer to 8(E) also as an interval function, since it associates an interval 
with each sample. Since the interval 8(E) is a function of the sample, 
its location and length will, in general, be random variables and, there- 
fore, probability statements can be made as to whether 8(E) includes 
the true parameter value 9 or not. For any value 9 we shall express 
the relation that 8(E) contains 9 by the symbol 8(E) C9. For any rela- 
tion R y the symbol P(R \ 9) will denote the probability that R holds 
when 9 is the true parameter value. 

According to Neyman, an interval function 8(E) is said to be a con- 
fidence interval of 9 if 

(11:2) P[8(E)C9\9] = y 

identically in 9 where 7 is a fixed value independent of 9. The relation 
( 11 : 2 ) simply says this: The probability that 8(E) will include the true 
parameter value is always equal to 7 no matter what the true value of 
the parameter happens to be. The fixed value 7 is called the confidence 
coefficient associated with the confidence interval 8(E). 

1 J. Neyman, “Outline of a Theory of Statistical Estimation Based on the Classi- 
cal Theory of Probability,” Philosophical Transactions of the Royal Society of Lon-> 
don, Series A, Vol. 236 (1937), pp. 333-380. 
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Suppose, now, that the distribution of x involves several unknown 
parameters, say 0 lf • • *, 6 r . Any set of possible values 0i, • • •, $ r can 
be represented by a point 0, called a parameter point, in the r-dimen- 
sional Cartesian space (parameter space). If we want to estimate the 
parameters 0i, • • • , 0 r jointly, i.e., if we want to estimate the parameter 
point 0, the estimating set will be some subset of the r-dimensional 
parameter space. Whereas in the case of a single unknown parameter, 
estimating sets other than intervals have little practical value, this is 
not so when several unknown parameters are to be estimated jointly. 
Estimating sets other than intervals in the r-dimensional space, such 
as the interior of a sphere, or ellipse, or more general regions, will 
have to be considered. Thus, we shall have to consider a set function 
a ){E) which associates with each sample point E a certain subset 03(E) 
of the parameter space without making the restriction that 03(E) is an 
r-dimensional interval. 

A set function 03(E) is said to be a confidence region of the param- 
eter point 0 = (0i, • • ♦, S r ) if 

(11:3) P[o,(E)Ce I e] = y 

identically in 0 where 7 is a fixed value independent of 0. The value 
7 is called the confidence coefficient of the confidence region 03(E ) . 

If only one of the parameters 0 X , • • •, 0 r is to be estimated, estimating 
sets other than one-dimensional intervals will not be of much practical 
interest, as in the case of a single unknown parameter. Suppose, for 
example, that only 0i is to be estimated. According to Neyman, an 
interval function 8 (E) is said to be a confidence interval of 0i with 
confidence coefficient 7 if 

(ii:4) Pimce, | e u d 2) • • •, 0 r ] — 7 

identically in 0i, 0 2 , • • *, 0 r . 

Usually there will be infinitely many confidence intervals 8 (E) or 
confidence regions 03(E) with a given confidence coefficient 7 and a 
fundamental problem is to find a proper confidence interval or con- 
fidence region which has some optimum properties. It is clear that a 
confidence interval or confidence region with a given confidence coef- 
ficient 7 will be regarded the better the shorter the interval or the 
smaller the region. The notion “short” or “small” is to be made pre- 
cise, since the length of a confidence interval and the size of a confi- 
dence region are random variables depending on the outcome of the 
sample. This has been done in the theory developed by Neyman who 
introduced various notions of optimum confidence intervals and con- 
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fidence regions. The mathematical consequences of these definitions 
have been investigated and optimum confidence intervals and regions 
have been derived in many important cases. It is not intended to go 
into further details here and the reader is referred to the original publi- 
cations of Neyman on this subject. 

11.2 Formulation of the Problem of Sequential Estimation by Inter- 
vals or Sets 

In estimation procedures based on a fixed number of observations, 
we cannot control, in general, the length of the confidence interval 
obtained, since this depends on the outcome of the sample. It may, 
therefore, sometimes happen that the confidence interval obtained is 
so long that it has little or no practical value. The possibility of such 
an event is a drawback inherent in estimation procedures based on a 
predetermined number of observations. 

For example, the length of the best confidence interval, based on a 
fixed number of observations, for the mean of a normal population 
with unknown standard deviation is proportional to the sample esti- 
mate s of the population standard deviation a. The sample standard 
deviation s may take any value and is likely to be large if a is large. 

To devise estimation procedures which lead to confidence intervals 
not only with a prescribed confidence coefficient but also with a pre- 
scribed length, or with a length not exceeding a prescribed value, or 
which satisfies some other similar condition, it is, in general, necessary 
to abandon the approach based on a fixed number of observations, and 
estimation procedures of sequential nature have to be constructed. 2 

The general nature of a sequential procedure of estimation by sets 
may be described as follows. For any positive integer m we consider 
a set S m of samples of size m. These sets must satisfy the following 
condition. If the sample E m is an element of S m and if E m > ( m' > m) 
is an element of S m > /, then E m must not be equal to the sample consist- 
ing of the first m observations in E m r. With any element E m of S m 
(m = 1, 2, • • •, ad inf.), we associate a subset c o(E m ) of the parameter 
space. 3 The sequential process of estimation is then carried out as 
follows. We continue to make observations on x until we reach a value 
n such that E n is an element of S n • At this stage, we stop the process 

2 A very interesting sequential procedure has been devised by C. Stein, “A Two 
Sample Test for a Linear Hypothesis whose Power Is Independent of the Vari- 
ance , The Annals of Mathematical Statistics, Vol. XVI, Sept., 1945, which leads 
to confidence intervals of fixed length in an important class of problems, including 
the example mentioned before. 

8 If we are concerned with interval estimation, u(E m ) will always be an interval. 
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and state that oo(E n ) contains the true parameter point, i.e., oo(E n ) is 
the confidence set resulting from the sequential estimation procedure. 

Thus, a sequential estimation procedure is determined by the sample 
sets S\y S 2 , • * *, etc., and the set function 00(E) defined for all samples 
E in Siy S 2y *• y etc. The fundamental problem in sequential estima- 
tion is that of a proper choice of S\> S 2f • • •, etc., and of 00(E). First 
we impose the following two conditions: 

Condition I. The confidence set c o(E n ) resulting from the sequential 
estimation procedure should satisfy certain stated requirements re- 
garding its geometric shape. 

Condition II. The confidence set oo(E n ) resulting from the sequen- 
tial estimation procedure should satisfy the inequality 4 

P[u{E n )Ce | e] ^ 7 

for all parameter points 0. (The quantity 7 is a fixed value which is 
frequently chosen as high as .95, or more.) 

The requirements to be imposed on the geometric shape of the con- 
fidence set oo(E n ) do not constitute a statistical problem, and they will 
be decided on the basis of practical considerations in each particular 
problem. For example, if there is only one unknown parameter 0 (the 
parameter space is one-dimensional), we may want to require that 
00(E) be an interval whose length should not exceed some fixed pre- 
scribed value dy or some given function of the midpoint of the interval. 
The latter case may be of interest, for example, in estimating the mean 
of a binomial distribution. If there are several unknown parameters, 
say 0i, • • *, 6 rj and we want to estimate them jointly, we may require 
that the Euclidean volume, or the diameter 5 of the confidence set 
oo(E n ) does not exceed some fixed prescribed value. If we merely want 
to estimate one of the unknown parameters, say 0i, we may impose 
the requirement that oo(E n ) be an interval with length not exceeding 
some prescribed fixed value, or the weaker requirement that oo(E n ) be 
a subset of the r-dimensional parameter space whose projection on the 
0!-axis has a diameter not exceeding some preassigned value. 

Usually there will exist infinitely many sequential estimation pro- 
cedures which satisfy Conditions I and II. The criterion for selecting 
one from among them will be based on the expected number of obser- 

4 This is weaker than the requirement by Neyman that the equality sign should 
hold. 

5 The diameter of a set is the largest possible distance between two points of 
the set. 
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vations required by the estimation procedure. The sequential esti- 
mation procedure may be regarded the better the smaller the expected 
number of observations required by the procedure. Thus, we shall try 
to select a sequential estimation procedure from the class of procedures 
satisfying Conditions I and II for which the expected number of obser- 
vations to be made is as small as possible. 

The problem of finding an optimum estimation procedure is un- 
solved. However, a special class of estimation procedures satisfying 
Conditions I and II will be discussed briefly in the next section. It is 
doubtful whether this class of procedures contains an optimum solu- 
tion in the sense defined before. 

11.3 A Special Class of Sequential Estimation Procedures 

The special class of sampling plans based on the inequality (10:7), 
and discussed in Section 10.5, can be used to obtain estimation pro- 
cedures satisfying Conditions I and II. With each sample point E n = 
(xi, • • •, x n ) (n = 1, 2, • • *, ad inf.) we associate the set c o(E n ) con- 
sisting of all parameter points 6 for which (10:7) is fulfilled for all 
values m S n. If we put A = 1/(1 — 7), then a)(E n ) will satisfy Con- 
dition II for each n. The estimation procedure is carried out as fol- 
lows. We continue taking observations as long as c c(E n ) does not 
satisfy the requirements in Condition I. We stop the process at the 
smallest n for which co(E n ) satisfies Condition I and then state that 
the true parameter point 6 is included in c o{E n ). This rule of stopping 
insures automatically the fulfillment of Condition I. 

If p m *(x 1, • • •, x m , 0) is chosen so that the probability is 1 that the 
diameter of c c(E m ) will converge to 0 as m — > 00 , and if Condition I is 
such that any set of sufficiently small diameter satisfies it, the prob- 
ability is 1 that the estimation process will be terminated at a finite 
stage. 

It is doubtful whether the special class of procedures considered here 
contains an optimum procedure in the sense of the preceding section. 
Even if we are willing to restrict ourselves to procedures based on 
(10:7), there is no theory yet developed for the proper choice of 
p m *(x 1, •••,x m ,0). Our aim is, of course, to choose p m *(x lf • • •, x m 0) 
so that the expected number of observations required by the pro- 
cedure should be as small as possible. An optimum choice of 
p m *(x 1, will depend also on the nature of Condition I. 

For example, if a certain choice of p m *{x 1, • • *, x my 0) is optimal when- 
Condition I requires that the diameter of w(Z£ n ) does not exceed a pre- 
assigned value, this choice will probably not be optimal when Condition 
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I requires that the diameter of the projection of w(E n ) on one of the 
parameter axes does not exceed a preassigned value, and vice versa. 

There may be some waste involved in putting A - 1/(1 — 7 ), since 
this may imply the validity of Condition II for a value 7 ' substantially 
larger than the intended 7 . A further development of the theory may 
show that A can be put equal to some value smaller than 1/ (1 — 7 ) 
which would lead to a saving in the number of observations. K5M- 
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A.1 PROOF THAT THE PROBABILITY IS 1 THAT THE SEQUENTIAL 
PROBABILITY RATIO TEST WILL EVENTUALLY TERMINATE 

The sequential probability ratio test terminates at the nth trial 
where n is the smallest integer for which either 

Z\ -I b z n ^ log A 

or j Zi = log 

y Z\ H b z n ^ log B 2/0 

6C-*t)4oo , i'T^<MK*'rr>U.<i(.Kr>cUh 

Let c = I log B | + | log A |. We shall subdivide the infinite se- 
quence z i, z 2 , Z 3 , • • • , ad inf., into segments of length r where r is some 
positive integer. Thus, the first segment S x will consist of the elements 
z Xl • • •, z r , the second segment S 2 will contain the elements z r+x , • 
z 2r , etc. In general, the A'tli segment Sk will consist of the elements 
Z(*i_i)r 4 -i, • • •> z kr- Let ik denote the sum of the elements in the /cth 
segment. It can be seen that if the infinite sequence z x , z 2 , • • • , ad inf., 
is such that the sequential process never terminates, then we must have 

(A:l) |fi|<c for k = 1,2, • • • , ad inf. 

Inequality (A:l) can also be written 

(A:2) 1 (ft) 2 < c 2 for k = 1, • • •, ad inf. 

Thus, in order to show that the probability is 1 that the sequential 
process will Eventually terminate, it is sufficient to prove that the 
probability is 0 that (A:2) holds for all integral values k. For any 
given positive integer i denote by P,- the probability that f ,- 2 < c 2 . 
Since Zj, z 2 , • • •, are independently distributed, each having the same 
distribution, the distribution of f ,• must be the same for all values i. 
Hence, also P t is independent of i and we shall denote it by P. Since 
f2> • • •> etc-, are independently distributed, the probability of the 
joint event that (A:2) holds for k = 1, 2, • • •, j is equal to P 3 . Hence, 
in order to show that the probability is 0 that (A:2) holds for all values 
k, it is sufficient to show that P < 1. Clearly, if the expected value 
of f,- 2 is > c 2 , then P must be < 1. Since the variance of z» is assumed 
to be positive, the expected value of f 2 can be made arbitrarily large 
by choosing r, i.e., the number of elements in a segment, sufficiently 
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large. Thus, P < 1, and we have proved the proposition: The prob- 
ability is 1 that the sequential probability ratio test procedure will even- 
tually terminate. 


A.2 UPPER AND LOWER LIMITS FOR THE OC FUNCTION OF A SEQUEN- 
TIAL TEST 

A.2.1 A Lemma 

In what follows we shall denote the expected value of any random 
variable z by E(z). For any relation R we shall use the symbol P(R) 
to denote the probability that R holds. If the expected value E(z) 
or the probability P(R) has been determined under the assumption 
that 6 is the true value of the parameter involved in the distribution 
of the random variable under consideration, we shall occasionally put 
this in evidence by using the symbols Eeiz) and Pe(R), respectively. 1 

In deriving lower and upper limits for the OC function of a sequen- 
tial test, we shall make use of the following lemma. 

Lemma A.l. Let z be a random variable such that the following three 
conditions are fulfilled : 

Condition I. The expected value E(z) exists and is not equal to 0. 

Condition II. There exists a positive 8 such that P(e z < 1 — 8) > 0 
and P(e z > 1 + 8) > 0. 

Condition III. For any real value h the expected value E(e hz ) = g(h ) 
exists. 

Then there exists one and only one real value ho ^ 0 such that 

E(e h « z ) = 1 

Proof : For any positive h we have 
(A:3) g(h) > P{e l > 1 + 5)(1 + S) h 

Hence, since P(e z > 1 + 8) > 0, 

(A:4) lim g{h) = + 00 

h= oo 

Similarly, we see that for any negative h 

g(h) > P(e‘ < 1 — 5)(1 — 6) h 
Hence, since P(e z < 1 — 6) > 0, we have 
(A:5) lim g(h) = +<x> 

h= — » 

1 If there are several unknown parameters, say Si, • • •, 6k, then 8 denotes the 
set (0i, • • •, 0*). 
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Since g"(h) = E(z 2 e hz ), 2 it follows from Condition II that 
(A:6) g"(h) > 0 

for all real values of ft. 

The relations (A:4), (A:5), (A: 6 ) imply that there exists exactly one 
real value ft* such that <7 (ft) takes its minimum value for ft = ft*. 
Since g'(0) = E(z) is unequal to 0 by Condition I, we see that h* j* 0 
and g(h*) < g( 0) = 1. It is clear that the function g(h) is monotoni- 
cally decreasing in the strict sense over the interval ( — °o , ft*) and is 
monotonically increasing in the strict sense over the interval ( ft*, + <»). 
Since gr(0) = 1 and g(ft*) < 1, there exists exactly one real value 
h 0 5 * 0 such that g(ho) = 1. Hence lemma A.l is proved. 

From the above considerations it follows that if ft* > 0 then also 
h 0 > 0, and if ft* < 0 then also h 0 < 0. Furthermore, if ft* > 0 then 
E(z) = g'(0) < 0, and if ft* < 0 then E(z) = g'(0) > 0. Hence, h 0 
and E{z) are of opposite sign. 


A.2.2 A Fundamental Identity 

In this section we shall derive an identity which will play a funda- 
mental role. Consider the sequential probability ratio test for testing 
the hypothesis H 0 that the probability distribution of x is given by 
f(x, do) against the alternative hypothesis Hi that the probability dis- 
tribution in question is given by f(x y 6i). Let z = log {^ ’ ^ and 

0o) 

Zi = log where Xi denotes the ith observation on x. As defined 

f(zi, 0 o ) 

in Section 3.1, the test procedure is given as follows. Continue taking 
observations as long as 


(A:7) 


log B < zi H b < log A 


where A and B (B < A) are constants determined before the experi- 
mentation starts. Accept H 0 when 

(A: 8 ) z\ H b Zm ^ log B 

and reject H 0 (accept Hi) when 

(A:9) Zi H \-z m ^logA 

2 From Condition III it follows that all derivatives of g(h) exist, and they may 
be obtained by differentiation under the integral sign, i.e., 




(r = 1, 2, • • •, ad inf.) 
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In what follows we shall denote by n the number of observations re- 
quired by the test. Clearly, n is a random variable. Let D' be the 
subset of the complex plane such that E(e zt ) = exists and is finite 
for any point t in D'. Consider the following identity: 

(A:10) = E(e ZNt ) = 

where N denotes a positive integer and Zi = z x d b %i- Let Pn 

be the probability that n g N. For any random variable u y let E N (u) 
denote the conditional expected value of u under the restriction that' 
n ^ N, and let En*(u) denote the conditional expected value of u 
under the restriction that n > N. Then identity (A: 10) can be writ- 
ten as 

(A:ll) P N E N (e^ l+(Zti - ZnH ) + (1 - P N )E N *{e Zltt ) = [<*>(<)]* 

Since in the subpopulation defined by any fixed n ^ N the expression 
Zn — Z n is independent of Z n) we have 

(A:12) E N (e ZJ+(Z "- z ’' > ‘) = E N \{e ZHt )[4>(t)] N ~ n } 

From (A:ll) and (A:12) we obtain the identity 

(A:13) P N E N {e Znl W)] N ~ n ) + (l — P N )E N *(e ZN ‘) = W)f 

Dividing both sides by we obtain 

E\r*(e ZNt ) 

(A:14) P N E N {e Zn ‘[<p(t)r n \ + (1 - P N ) = 1 

Let D" be the subset of the complex plane in which | 4>(t) | § 1 
and let D denote the common part of the subsets D' and D Since 
lim (1 — Pn) = 0, and since | E N *(e ZNt ) | is a bounded function of AT, 

JV=> 0O 

we have in D 

jp */ Zat<\ 

< A:16 > j™. (1 - p "> “woF " 0 

Since 

limPAle 7 "^)] - ”} = Ele^W)}^} 

N- » 

we obtain from (A:14) and (A:15) the fundamental identity 
(A:16) E{e Znt [<j>(t)]- n } = 1 


for any point t in the set D. 
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A.2.3 Derivation of Upper and Lower Limits for the OC Function 


The OC function of the sequential test is defined by the function 
L(0), where L(0) denotes the probability that the sequential process 
leads to the acceptance of H 0 when 0 is the true value of the pa- 
rameter. 3 It has been shown in Section A.l that the probability is 
0 that the sequential process will never terminate, i.e., the relation 
P (n = oo ) = 0 has been proved. Thus, the probability that the proc- 
ess will terminate with the rejection of H 0 (acceptance of Hi) is given 
by 1 — L(0). Using the fundamental identity derived in the pre- 
ceding section we shall obtain upper and lower limits for L(0). 

f(x, 0i) 

It will be assumed that the distribution of z = log satisfies 

/(*> Oo) 

the three conditions of lemma A.l for any value 0. Then for any given 
0 there exists exactly one real value h(6) 0 such that Ee{e zh(<e) ) = 1 . 

Substituting h(0) for t in the fundamental identity (A: 16), we obtain 

(A: 17) E e (e Znm ) = 1 


since <t>[h(6)] — 1 . 

Let Eq* be the conditional expected value of e Znh ^ 0) under the restric- 
tion that Hq is accepted, i.e., that Z n ^ log B y and let Eq** be the 
conditional expected value of e Znh{fi) under the restriction that H\ is 
accepted, i.e., that Z n ^ log A. Then we obtain, from (A:17), 


(A:18) [L(0)]Eq* + [1 - L(6)]Eo** = 1 


Solving for L(0) we obtain 
(A:19) L{d) = 


E e ** - 1 
Ee** - E e * 


If both the. absolute value of Eq{z) and the variance of z are small, 
as they will be when f(x , 6\) is near fix , 0o), then Ee* and Ee** will 
be nearly equal to B h{6) and A h{6) , respectively. Hence, in this case 
a good approximation to L(0) is given by the expression 


(A:20) 


A m - 1 
” £ h(e ) — B H9) 


This is the approximation formula (3:43) given in Section 3.4. It is 
easy to verify that A(0) = 1 if 0 = 0 O , and hid) = — 1 if 0 = d\. The 
difference L(0) — L(0) approaches 0 if both the mean and the variance 
of z converge to 0. 

8 For simplicity the case of a single unknown parameter $ is discussed, but the 
results can obviously be extended to any number of parameters. 
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To judge the goodness of the approximation given by L(B), it is 
desirable to derive lower and upper limits for L(d). Such limits can 
be obtained by deriving lower and upper limits for Ee* and Ee**- 
First we consider the case when h(6) > 0. To obtain a lower limit for 
E$* consider a real variable f which is restricted to values > 1. For 
any random variable u and any relation R we shall denote by E(u | R ) 
the conditional expected value of u under the restriction that R holds. 
Let Pe(0 denote the probability that e h{Q)Zn ~ l < fi3 W) . Then we 
have 

(A:21) Eg* =J \^B m Eg (e mu \ e h(e)z g d,P e (?) 

Hence, a lower bound of E$ * is given by 

(A:22) B m [g.Lb. {Eg (e h(e)z \ e hWz ^ ^ j 

where the symbol g.l.b. stands for greatest lower bound with re- 

r 

spect to f. Since 2? A(0) is an upper bound of E$*, we obtain the limits 

(A:23) B m T g.l.b. SEg(e hWz I e hWz ^ -\] g Eg* g B m 

f lh(6) > 0] 

To derive limits for E $ ** consider a real variable p which is restricted 
to values > 0 and < 1. Let Q(p ) denote the probability that 
e W6)z n - 1 < pA h ^ d \ Then we obtain 

(A:24) Eg** =J° [pA A(S) £ # (e H6)z \ e h(6)z ^ j dQ(p) 

Hence an upper bound of Ee** is given by 
(A:25) [l.u.b. P Eg (e™ z \ <* Wz 

Since is a lower bound of Eg**, we obtain the following limits 
for Eg**: 

(A:26) A m £ Eg** g A m [l.u.b. P Eg (e hWz | e hWz ^ -)] 

m > o] 
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Putting 

(A 37) g.l.b. tEg (e m ‘ | e W)z ^ ^ = ve 

and 

(A:28) l.u.b. pE e (e hi6)z \ e h(e)z S; ^ = Sg 

inequalities (A:23) and (A:26) can be written as 
(A:29) B hW V e ^ Eg* ^ B hW 

and 

(A:30) A hm ^ Eg** ^ A A(8 ) 5s 

Since B < 1 and A > l* we see Eg* < 1 and Eg** > 1 if h( 6 ) > 0. 
From this and relations (A:19), (A:29), and (A:30), it follows that 

A m - 1 ^ ri ^ SgA m - 1 
(A:31) A hW - ve B m ~ L{d) ~ a 9 A AW - B hW 

where h{ 0 ) > 0. 

If h(d) < 0, limits for L( 6 ) can be obtained as follows. Let z f = — z, 
A' = l/B and B' = 1/A. Consider the sequential test S' defined as 
follows. Continue taking observations as long as log B' < z'\ -\ — • 
+ z' m < log A'. Terminate the process with one or the other decision, 

depending on whether z\ H V z' m = log B' or g; log A'. We shall 

let L'{d) be the probability that at the termination of the process the 

cumulative sum z\ H b z' m is less than or equal to log B'. Then 

2 /( 0 ) = l — L( 6 ). Furthermore, we shall denote the quantities h(d), 
ye, 80 corresponding to the test S' by h'(9), 7/0, and £'0, respectively. 
We can apply (A:31) to the test S', since h'( 0 ) = -h{ 0 ) > 0. Thus, 
we obtain 

A' m) ~ 1 ^ ^ VeA' W{9) - 1 j 

( A:32 ) A ,h'{ 0 ) _ ^ dB fh\9) - L - s' e A' h ' (d) - B ,wm 

where h'(d) > 0. Since w and 80 depend only on the distribution of 
h(d)z, and since h'( 6 )z' = h( 0 )z, we have y'o = rj e and 8 ' e = 8 e . Sub- 
stituting, in (A:32), 80 for 8 ' 0 , rj e for v/e, l/B for A', 1 /A for B', — h (6 ) 
for h'($), and 1 - L(B) for L'( 6 ), we obtain 

4 We have assumed that B < A. Since we let B = 0/(1 — a) and A = ( l— 0)/a, 
we must have 0/(1 - a) < (1 - 0)/«. Multiplying this inequality by a(l - a), 
we obtain «0 < 1 - a - 0 + <*0, i.e., 0 < 1 - a - 0. Hence 0 < 1 - a and 
1 — 0 > a, and therefore B < 1 and A > 1. 
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B m _ ! s e B h( V _ 1 

(A:33) B h(e) - neA hW - 1 ” m - 5 e B hw - A m 
where h(6) < 0. Hence 

1 -A™ 1 -veA™ 

(A:34) S e B hW - A«» ~ m ~ B hW - m A hie> 

where h{6 ) < 0. 

We can summarize our results as follows. If h(6 ) > 0, limits for 
L(6) are given in (A:31). If h(8) < 0, limits for L(d) are given in 
(A:34). The quantities 8$ and ve are defined in (A:27) and (A:28). 

In Sections A.2.4 and A.2.5 we shall calculate the values of 8 $ and 
rie for binomial and normal distributions. If the limits of L{6) as given 
in (A:31) and (A:34) are too far apart, it may be desirable to deter- 
mine the exact value of L(8), or at least to find a closer approximation 
to L{8) than that given in (A:31) and (A:34). A method of dealing 
with this problem is described in Section A.4. There the exact value 
of L(6) is derived when z can take only a finite number of integral 
multiples of a constant d. If z does not have this property, arbitrarily 
fine approximations to the value of L(6) can be obtained, since the 
distribution of z can be approximated to any desired degree by a dis- 
crete distribution of the type mentioned above if the constant d is 
chosen sufficiently small. 


A.2.4 Calculation of 8 e and ip for Binomial Distributions 

Let A be a random variable which can take only the values 0 and 1. 
Let pi be the probability that X = 1 when Hi is true (i = 0, 1). Let 
H be the hypothesis that p is the probability that X = 1. Denote 
1 — p by q and 1 — Pi by q t (i = 0, 1). The distribution f(x, p) of x 
is given as follows: /( 1, p) = p and /( 0, p) = q. It can be assumed 
without loss of generality that p x > p 0 . The moment generating 
fix Vi) 

function of z = log J -jr 1 — - is given by 
fix, Po) 


4,{t) = E p (e’‘) = E p 



Let h(p) 7 ^ 0 be the value of t for which = 1, i.e., 
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First we consider the case when hip) > 0. It is clear that e** (p) = 

[ fix 7?i) 1 A(p) 

— > 1 implies that x = 1. Hence e zh(p) > 1 implies that 

f(x,Po)l 



From this and the definition of 8 P 


given in (A:28) it follows that 
(A:35) 8 P 



where h(p) > 0. Similarly, the inequality e zh ^ p) < 1 implies that 
e zWp) _ From this and the definition of rj p given in (A:27) 

it follows that 

(A:36) 

where h(p) > 0. 

If h{p) < 0, it can be shown in a similar way that 



(A:37) 

where hip) < 0, and 
(A:38) 

where hip) < 0. 



A.2.5 Calculation of be and for Normal Distributions 

We shall now assume that X is normally distributed with unknown 
mean 6 and known variance <r 2 . We can assume without loss of gener- 
ality that ( 7 = 1 , since this can always be achieved by multiplication 
by a proportionality factor. Then 

(A:39) f(x, 0d = -j= e~ « = 0, 1) 

and 

(A:40) f(x,0) = 

We can assume without loss of generality that do = — A and = A 
where A > 0, since this can always be achieved by a translation. Then 


z = log 


fix, fli) 
fix, e 0 ) 


= 2 Ax. 


(A:41) 



166 


APPENDIX 


The moment generating function of z is given by 
(A:42) E e (e“) = e 2m+2 * V 


Hence 


(A:43) m = - 7 

A 

Substituting this value of h(9) in (A:27) 


and (A:28) we obtain 


(A:44) 

and 

(A:45) 


8 g = l.u.b. pE e (e~ 2(lx \ e~ 2ex 3: 
ve = g.l.b. tE, (e~ 26x | e” 29 * g 


For any relation R let Pq*(R ) denote the probability that the rela- 
tion R holds under the assumption that the distribution of x is normal 
with mean 0 and variance unity. Furthermore, let Pq**(R ) denote the 
probability that R holds if the distribution of x is normal with mean 
—0 and variance unity. Since e~ 26x is equal to the ratio of the normal 
probability density function with mean —0 and variance unity to the 
normal probability density function with mean 0 and variance unity, 
we see that 

(A:46) E$ (e~ 2ex | e~ 26x ^ -J 

and 

(A:47) E, (e~ 26x \ e~ 26x g 

r Pe* \e~ 26x g - 



It can easily be verified that the right-hand members of (A:46) and 
(A:47) have the same values for 6 = X as for 9 — — X. Thus, 8e and 
rie also have the same values for 0 = X as for 6 = — X. It will therefore 
be sufficient to compute 8 g and ye for negative values of 0. Let d = 
—X where X > 0. First we show that yg = 1/ 8g. Clearly, 


fP#** 



tfV*(e~ 2Xx £ r) 

Pg*(e~ 2Xx S f) 


(A:48) 


Pe* (e 2 * x g 


(1 2 * { < •) 
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Letting £ = (1/p) (0 < p g 1) in (A:48) gives 


(A:49) 


Hence 


(A:50) vo = gib. 

r 


> ^e 2Xx ^ ^ Pg** ^e~ 2Xx ^ 
^e 2Xx g ^ pP e * (e~ 2Xx S 

£Pg** (e 2Xx £ 


(e- S l) 


pPe* (e~ 2Xx S -) 
P e ** (e~ 2Xx ^ 


Because of the symmetry of the normal distribution, it is easily seen 
that 






Hence 


(A:51) 


19 = — 
&9 


Now we shall calculate the value of 8$- Let G(x) denote 

i r - - 

— 7 = | e 2 dt. Then 
v27r •/* 

Pe** (e 2Xx £ = Pe** (iXx 3: log ^ = Pe** ( x = ^ log ^ 

= (?(— log- — x") 

\2X o / 


Similarly 


P.* S 1) - P.* (x S i l°g 1) - G (1 log I + x) 
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Let u denote (1/2A) log (1/p). Since p can vary from 0 to 1, u can 
take any value from 0 to » . Since p = e~ 2XM , we have 



We shall prove that 
(A:53) 


x (u) = e 2uX 


G(u — X) 
G(u + X) 



G(u - X)\ 
G(u -f* X)/ 


(0 ^ u g oo) 


s a monotonically decreasing function of u and consequently has a 
maximum at u = 0. For this purpose it suffices to show that the de- 
rivative of log xM is never positive. Now 

(A:54) log x( u ) = log G{u — X) — log G(u + X) — 2 Xu 


1 d 

Let #(rc) denote —7= e~ Hx \ Since — G(u) = -<!>(“), it follows from 
V2ir du 

(A:54) that 


(A :55) 


— 1°8 x(«) = 

du 


$(w - X) ( 4>(« +'X) 
G(« - X) + (?(u + X) 


It follows from the mean value theorem that the right-hand side of 


(A:55) is never positive if 


du 


QiVj) 

— — is equal to or less than 1 for all 
G(u) J 


values of u. Thus, we need merely to show that 


(A:56) 


d I" 4>(m)" 
du LG(m). 


-t-(u) 

Let y denote . 

G( u) 


&{u)G{u) - G'(uyHu) _ V(u)G(u) + $ 2 («) 
GHu) ~ G 2 (u) 

4> 2 (w) $(m) 

— u ^ 1 

G\u) G(u ) 

The roots of the equation y 2 — uy — 1 = 0 are 
u =fc Vr + 4 


Hence the inequality y 2 — uy — 1^0 holds if, and only if, 


u — vV + 4 u + Vw 2 + 4 


2 


2 



LIMITS FOR THE OC FUNCTION 


169 


Since y cannot be negative, this inequality is equivalent t$i^/ 
4>(V) u -4- y/ u 2 4- 4 

(A:57) 


$>(u) u + Vm 2 + 4 
= ~ 


Gin) 

Thus we merely have to prove (A:57). We shall show that (A:57) 
holds for all real values of u. Birnbaum 5 has shown that for u > 0 


(A:58) 

Hence 

(A:59) 


Vu 2 + 4 — u 
2 


4>(w) ^ G(u ) 


$(u) ^ 2 + 4 + u 

G(u) ~ a/u 2 + 4 — u 2 


(u > 0) 


which proves (A:57) for u > 0. Now we prove (A:57) for u < 0. Let 
u — —v where v > 0. Then it follows from (A:59) that 


(A :60) 


G(v) ” a/4 + v 2 - v 


Taking reciprocals, we obtain, from (A:60), 


(A :61) 

G(v) ^ a/ 4 + y 2 — v 

4>(y) ~ 2 

Since 

G(u) G(v) + 2y4>(y) G(v) 

> = + 2y 

4>(w) 4»(y) 4>(y) 

we obtain, from (A:61), 


G(u) Vy 2 + 4 + 3v \/y 2 + 4 + y 

(A:62) — - ^ ^ 

$(w) 2 2 

Taking reciprocals, we obtain 

4>(w) ^ 2 vV + 4 — y V^w 2 + 4 + w 

= VV + 4 + y “ 2 ~~ 2 

Hence (A:57) is proved for all values of u and consequently be is equal 
to the value of the expression (A:53) if we substitute 0 for u. Thus 

(A:63) s » = ~7^r ( x = M) 

8 Z. W. Birnbaum, “An Inequality for Mills’ Ratio,” The Annals of Mathematical 
Statistics , Vol. XIII (1942). 
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Formula (A:63) has been derived for the case in which 0o = —A, 
0i = A, and a = 1. It can easily be seen that for general values 0 O , 
0i, and <r we have 


(A:64) 


g(-X) 

G(\) 


1 

where X = - 

<j 


0o + 0i 
2 


A.3 UPPER AND LOWER LIMITS FOR THE ASN FUNCTION OF A SE- 
QUENTIAL PROBABILITY RATIO TEST 


A.3.1 Derivation of General Formulas for Upper and Lower Limits 

As before, let 


2 = log 


/Mi) 

f(x, Oo) 9 




ffri, 0i) 

ffri, 0o) 


(i = 1, 2, • • •, ad inf.) 


and let n be the number of observations required by the sequential 

test, i.e., n is the smallest integer for which Z n = Zi d b z n is 

either ^ log A or ^ log B. To determine the expected value E(n) of 
n under the hypothesis H that 0 is the true value of the parameter, we 
shall consider a fixed positive integer N . The sum Z # = Zi + ••• + 3# 
can be split in two parts as follows: 


(A:65) 


Zn = Z n + Z' n 


where Z' n = z n +\ d b *n if n ^ N and Z' n = Z N — Z n if n > N. 

Taking expected values on both sides of (A:65) we obtain 

NE e (z) = E e (Z n + Z' n ) 

Let Pat denote the probability that n ^ N. Then 

E$(Z n + Z' n ) = P nEqn{Zu + Z' n ) + (1 — Pn)Eon*(Zn) 

where the operator Eon means conditional expected value when n ^ N, 
and Eon* means conditional expected value when n > N. 

Since Zn lies between log B and log A when n > N f and since 
lim (1 — Pn) = 0, we obtain from the last two equations 

lim [NE e (z) - P N Eo N (Z n + Z' n )] = 0 

isr- «o 


(A:66) 
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For any given value of n < N, the variates z n + 1, • • • , Zy are inde- 
pendently distributed, each having the same distribution as z. Hence, 
we have 

E 6N (Z'n) = E e y(N - n)E e (z ) = -E eN (n)E e (z) + NE e (z) 

From this and (A:66) we obtain, since lim (1 — Py)N ~ 0, 1 

N =» « 

(A:67) lim [P N E eN (n)Ee(z) - P N E eN {Zn )] = 0 

N = * 

Since 

\\mP N Eo N (n) = E e (n) and lim P N E eN (Z n ) = E e (Z n ) 

N~ 00 00 

equation (A:67) gives 

(A:68) ^(Zn) = Ee(n)E e (z) 

Hence 

Ee(Z n ) 

(A;69) aw - - AJ 

if # 0 ( 2 ) 0. Let Eo*(Z n ) be the conditional expected value of Z n 

under the restriction that the sequential analysis leads to the accept- 
ance of Ho, i.e., that Z n g log B . Similarly, let E e **(Z n ) be the con- 
ditional expected value of Z n under the restriction that Hi is accepted, 
i.e., that Z n ^ log A. Since L(0) is the probability that Z n ^ log B , 
and 1 — L(0) is the probability that Z n ^ log A, we have 

(A:70) E e (Z n ) = [L {6)]E e *{Z n ) + [1 - L(e)]E e **(Z n ) 


From (A:69) and (A:70) we obtain 


(A:71) 


E e (n) = 


[L(6)]Eo*(Z n ) + [1 - m]E e **(Z n ) 
E e (z) 


ii 


The exact value of Ee(Z n ), and therefore also the exact value of 
E$(n), can be computed if 2 can take only integral multiples of a con- 
stant d , since in this case the exact probability distribution of Z n was 
obtained (see Section A.4). If z does not satisfy the above restriction, 
it is still possible to obtain arbitrarily fine approximations to the value 

1 C. Stein has shown, in “A Note on Cumulative Sums,” The Annals of Mathe- 
matical Statistics , Vol. 17 (1946), that all moments of n must be finite. This implies 
that lim (1 — Pj\r)iV* “ 0 for any positive integer k. 

iV-00 
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of Ee(Z n ), since the distribution of z can be approximated to any de- 
sired degree by a discrete distribution of the type mentioned above 
provided the constant d is chosen sufficiently small. 

If both | E(z) | and the standard deviation of z are small, Ee*(Z n ) 
is very nearly equal to log B and iffi^lZn) is very nearly equal to 
log A. Hence in this case we can write 


(A:72) 


ii 


E e (n) 


[Lm log B + [ 1 - mi log A 
E e (z ) 


T his is the same approximation formula as given in (3:57). 

To judge the goodness of the approximation given in (A:72) we shall 
derive lower and upper limits for E$(n) by deriving lower and upper 
limits for E e *(Z n ) and E e **(Z„). Let r be a non-negative variable 
and let 


(A:73) He = Max Eeiz - r | z g r) (r ^ 0) 

r 

and 

(A:74) H'e = Min Eeiz + r \ z + r g 0) (r ^ 0) 

r 

It is easy to see that 


(A:75) log A =g E e **(Z n ) g log A + He 


and 


(A:76) log B + H'e ^ E e *(Z n ) ^ log B 


We obtain from (A:71), (A:75), and (A:76) 
L(0)(log B + H'e) + [1 — Lid)] log A 


(A:77) 


and 

(A:78) 


^ Ee(n) 


Eeiz) 

[Lm log B + [1 - Lm (log A + He) 
~ Eeiz) 

[Lm log g + [1 - L(g)](log A + He) < 

E e (z ) 

L(0) (log B + H'e) + [1 - L<m log A 
~ E 6 {z) 


if Ee{z) > 0 


if Eeiz) < 0 


The limits given in (A:77) and (A:78) will generally be close to each 
other for values 0 ~ Oq and 8 = 8\. However, for values 8 between 
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$o and 0i the difference between the upper and lower limits may be- 
come very large, since Eq(z ) may be near (or equal to) 0 for such 
values 0. In fact, we have seen that Eq 0 (z) < 0 and Ee x {z) > 0. 
Hence, if Eq(z) is a continuous function of 0, there will be a value 0' 
between 0 O and 0i such that E$'(z) = 0. For 0 = 0' or for values 0 
very near 0' the limits given in (A:77) and (A:78) are of no practical 
value, since they are far apart. 

We shall now derive limits for Ee(n) which can be used for values 0 
in the neighborhood of 0'. 2 For this purpose, we shall expand e mZn 
in a Taylor series as follows: 

(A:79) e mz " = 1 + h(6)Z n + \[h{e)fZ n 2 + i[fc(0)] 3 Z„V 

where X is some value between 0 and h(6)Z n . From (A:17) and (A:79) 
we obtain 

(A SO) h(6)E e (Z n ) = -\{h(d)?E 6 (Zr?) - \[h{d)fE e {Z n h x ) 

From this and (A:69) it follows that 

(A SI) E e (n) = - E e (Z n 2 ) - 

2 E e (z) 6E e (z) 

Thus, upper and lower limits for Eo(ri) can be obtained by deriving 
upper and lower limits for Ee(Z n 2 ) and E d (Z n 3 e x ). To derive limits 
for Ee(Z n 2 )j we write 

(A :82) E e {Z n 2 ) = L(d)Ee*(Z n 2 ) + [1 - L{d)]E e **{Z 2 ) 

where the operator E * stands for conditional expected value when 
Z n ^ log B , and E** stands for conditional expected value when 
Z n ^ log A. Let e' denote Z n — log B and e" denote Z n — log A. 
Then 

(A:83) Ee*(Z n 2 ) = (log B) 2 + 2(log B)E e *W + E e *(e' 2 ) 
and 

(A:84) E e **(Z n 2 ) = (log A) 2 + 2(log A)E 0 **(e") + E e **( e" 2 ) 

Since Ee*(e' 2 ) ^ 0 and (log B)Ee*(t') ^ 0, we obtain, from (A:83), 

(A:85) (log B) 2 <,Ee\Z n ) 2 

2 See also the author's paper, “Some Improvements in Setting Limits for the 
Expected Number of Observations Required by a Sequential Probability Ratio 
Test," The Annals of Mathematical Statistics f Vol. 17 (1946). 
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The quantity t-'g given in (A:74) is a lower bound for Eg*(t'). Since 
log B < 0, (log £)£'« is an upper bound for (log B)Eg*{t'). An upper 
bound for Eg*(e' 2 ) is given by 

(A:86) f'tf = Max E e [(z + r) 2 \ z + r ^ 0] (r ^ 0) 

r 

Hence 

(A:87) E e *(Z n 2 ) £ (log B ) 2 + 2(log B)? e + 

Thus we obtain the limits 

(A:88) (log B ) 2 <, Eg*(Z n 2 ) ^ (log B ) 2 + 2(log B)£ e + t'e 
In a similar way, the following limits can be derived for Eg**(Z n 2 ): 
(A:89) (log A ) 2 < E e **(Z n 2 ) g (log A ) 2 + 2(log A)Z$ + 
where f# is given in (A:73) and 

(A:90) U = Max Eg[(z - r) 2 | z ^ r] (r ^ 0) 

r 

If we denote by L'(6) the lower limit and by L"{8) the upper limit 
of L(8) given in (A:31) [(A:34) when hid) < 0], we obtain from (A:82), 
(A:88), and (A:89) the following limits for Ee{Z n 2 ): 

(A:91) L'(0)( log B ) 2 + [1 - L"(<?)](log A) 2 g E e {Z 2 ) 

g L"(0)[(log B ) 2 + 2 (log B)? e + f' e ] + 

[1 - l/(0)][(log A ) 2 + 2 (log A) fa + ft] 


Using a similar method, one can also derive upper and lower limits 
for Eg(Z n 3 e x ) without any difficulty. We shall, however, not derive 
such limits here, since we are interested in obtaining limits for Eg(n) 
when 6 is near 6' and since, for such values of 8, the second term in 
the right-hand member of (A:81) is negligible. We shall show that, if 
h(8), Eg{z), and Eg(z 2 ) are continuous functions of 9, the factor 
[h(8)] 2 /[E$ (z)] in that term converges to 0 as 6 —* 6'. It follows from 
the discussion given in Section A.2.1 that lim h (0) = 0. Since- 

9 = 9 ' 


(A:92) 


E e (e m •) = E e 


1 


+ h(8)z + 


m? 2 ^ 

2 + 

2! 


we obtain, when h{8) ^ 0, 

f h(8) „ 

(A:93) ^ ( 2 + 2! Z + 


3! J 


3! 


z 3 e «W)*| = l 

(0 g u g 1) 


= 0 
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Thus 

(A:94) = E t f - - z 2 - — z 3 e“ A(9)2 1 [h(6) ^ 0] 

h{9) L 2 3! J 

Assuming that Ee(e^ z] ) is a bounded function of 0 in the neighbor- 
hood of 0', we see that E$(\ z | 3 e ’ h(fi) 1 1 * ') is also a bounded function of 
0 in a sufficiently small neighborhood of 0'. 3 Hence, E$(z 3 e uh{e)z ) is 
also a bounded function of 0 in the neighborhood of 0'. From this and 
(A:94) it follows that 

E$(z) 1 

(A:95) lim T^ = ~o E A*)< 0 

0 = 0 ' h{6) 2 


From (A:95) it follows that 


(A:96) 


r M *)] 2 

lim 

e-d f E e (z) 


= 0 


The lower and upper limits for Ee(n), based on (A:81), will generally 
be close to each other for values 0 in a small neighborhood of 0'. Thus, 
when 0 is near 0' these limits for Ee(n) can be used instead of the limits 
given in (A:77) and (A:78). 

It may be of interest to determine the limiting form of (A:81) when 
0 = 0'. If Ee{Z n 2 ) is a continuous function of 0 and E$(Z n 3 e x ) is a 
bounded function of 0 in the neighborhood of 0', it follows from (A:81), 
(A:95), and (A:96) that 4 


(A:97) 


E d '(n) 


W 2 ) 

E d '(z 2 ) 


The boundedness of Ee(Z n 3 e x ) can be proved if, for t = ±1, the ex- 
pected value pEo if I e “ * ;) is a bounded function of 0 and p 

(0 < p < 1). Since lim h(6) = 0, there exists a constant C such that 
0=0' 

| Z„ 3 e x | g Ce 1 Zn 1 for 6 in the neighborhood of 0'. Hence, we merely 
have to show that Ee(e l Zn ') is bounded. Since e Zn + ^ e 1 Zn it 

is sufficient to show that both Ee(e Zn ) and Ee(e~ Zn ) are bounded. We 
have 

Ee(e Zn | Z n £ log A) g A l.u.b. [ P E e (e z | e 2 ^ ^ j 

3 This follows from the fact that | h(9) | < 1 when 0 is sufficiently near 0'. 

4 A different method for deriving (A:97) was given in the author’s paper, “Dif- 
ferentiation under the Expectation Sign in the Fundamental Identity,” The Annals 
of M<dhematical Statistics , Vol. 17 (1946). 
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where 0 < p < 1. Since 

E e (e Zn 


Z n g log B) g B 


we obtain 


E e (e Zn ) ^ A l.u.b. [ P E t (e z \ e* g j + B 


The right-hand member of this equation is bounded, since 

pEe (e z | e z ^ is bounded by assumption. Hence Eo(e Zn ) is bounded. 

The boundedness of Ee(e~ Zn ) can be shown in a similar way. Upper 
and lower limits for Eq>{u) can be obtained from (A:97) by substituting 
for Ee'(Z n 2 ) the upper and lower limits given in (A:91). 

We shall now compute an approximate value of Ee*{n ), neglecting 
the excess of Z n over the boundaries. Since lim h{d) = 0, we obtain, 
from (3:43), 8 9 ~° 

log A 

urn 


(A :98) 
Hence 
EeW) 


log A 


log A — log B 


(log B) 2 + 


log A — log B 
- log B 


log A — log B 


(log A) 2 

= — log B log A 


Thus an approximate value of Ee>(ri) is given by 6 

— log B log A 


(A:99) 


„ , . E e ,{Z 2 ) 


Ee>(z 2 ) 


If the OC function L{6) of the test is known exactly, close limits for 
Ee(n) can be derived which remain valid over the entire range of 6. 
We shall indicate briefly the derivation of such limits. Denote by 
fe(z) the distribution of z when 6 is the true value of the parameter. 
By the distribution of z conjugate to the distribution fe(z) we shall 
mean the distribution e h ^ z fe(z). In important cases, such as for bi- 
nomial and normal distributions, to any given value 0 of the param- 
eter there will correspond a value 0 such that f' d (z) is conjugate to 

8 W. Allen Wallis obtained this approximation formula independently of the 
author. It is included in the publication of the Statistical Research Group of 
Columbia University, Techniques of Statistical Analysis , Chapter 17, Section 7.2, 

McGraw-Hill, New York (1946). 
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fe(z), i.e., fg(z) — e h(eu f e (z). We shall call 9 conjugate to 9. It has 
been shown elsewhere 6 that 

(A:100) E e *(e mZn ) = — and E e **(e mZn ) - 1 — — 

L{9) 1 - L(9) 

On the other hand, 

(A:101) E e *(e mZn ) = e h w Ee * iZn) E e * (e h mZn ~ E(, * (Zn)I ) 

= e mEe ^ Zn) Ee* {l + [Z n - E e *(Z n )] 2 e v j 

where v lies between 0 and h (9)[Z n — E$*(Z n )]. Similarly 
(A:I02) E e **(e mZn ) 

= e mEe ** (Zn) Eg** jl + [Z n - E e **(Z n )]V j 

where v' lies between 0 and h(9)[Z n — Eg**(Z n )]. From (A:100), 
(A:101), and (A:102) we obtain 


(A:103) 


and 
(A: 104) 


Thus 


Ee*(Z n ) 1 W) 

E e (z) h(9)E e (z) ° g L(9) 

j ^ , og ( 1 + !^!! B ,.{ [z ._ wz , ) iv}) 
E»**(z n ) _ i i -m _ 

E e (z) h(9)E e (z) ° S 1 - L(9) 

,°8 (, + ~ E," j [Zn - £.**(Z„)]V'} ) 


(A:105) 

where 


Eg(n) 


1 


h(9)Eg(z) 


L(9) 

+ [. 


m\ log 


1 


m\ 

1 - L{9) J 


+ R 


(A:106) 

* - - wb> \ m 108 ^ M [Z * ~ **«H) + 

[1 - L(0)] log (l + [ ^~ E e **\[Z n - £ # **(Z„)]V'j)] 


6 See, for instance, the author’s article on “Some Generalizations of the Theory 
of Cumulative Sums of Random Variables,” The Annals of Mathematical Statistics , 
Vol. XVI (1945). 
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Since h(6)Ee(z ) 0 (see Section A.2.1), we see that R ^ 0. Hence a 
lower bound for Ee(ri) is obtained by substituting 0 for R in (A: 105). 

To obtain an upper bound for Eg (n) we shall derive an upper bound 
for R. Clearly 

(A:107) { (Z n - log B) + [E e *(Z n ) - log B]\ 2 [Z n - E e *(Z n )] 2 

whenever Z n ^ log B. From this and (A:76) we obtain 

(A:108) [{Z n - log B) + ftp £ [Z n - E e *(Z n )} 2 

whenever Z n ^ log B. Similarly, we obtain 

(A:109) l(Z n - log A) + &p ^ [Z n - E e **(Z n )] 2 

whenever Z n ^ log A, where & is given by (A:73). From (A:107), 
(A:108), and (A:109) it follows that 

(A:110) E e *{[Z n - E e *(Z n )] 2 e v } 

g E e *[(Z n - log B + ft) a «> log B+( ' e 1 1 m '] 

and 

(A:lll) E e **{[Z n - E e **(Z n )] 2 e v '\ 

^ Ee**[(Z„ - log A + &)¥ z "“ ij 

Furthermore, we have 

(A:112) E e *[(Z n - log B + ft) 2 e' Zn ~ log 1 1 m '] 

^ Max E,*[(z + r + ft) V * +,+£ ' 9 1 1 m 1 1 z + r ^ 0] = P ' (say) 

r^O 

and 

(A:113) E e **[(Z n - log A + &)V**~ log 4+£<,)l m '] 

g Max E e **[(z - r + &>) 2 e ( * _r+£9)IM9) 1 1 z - r ^ 0] = p" (say) 

r£0 

From (A:106) and (A:110) through (A:113) we obtain the following 
upper bound for R: 

l/l [ft(0)] 2 ] 

An upper limit for E e (n) is obtained by substituting R for R in 
(A:105). The value of R will generally be small over the entire range 
of 0. 
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A.3.2 Calculation of the Quantities §# and §'0 for Binomial and Nor- 
mal Distributions 

Let X be a random variable which can take only the values 0 and 1. 
Let the probability that X = 1 be denoted by 0. Then the distribu- 
tion of x is given by f(x , 0), where /(l, 0) = 0 and /( 0, 0) = 1 — 0. 
Let Hi be the hypothesis that 0 = 0; (i = 0, 1). It can be assumed 

f(x, 0 X ) 

without loss of generality that 0 X > 6q. It is clear that log — — — > 0 


f(x , 0i ) 

implies that x = 1 and consequently log — — ; = log 


f(z, Oo) 

/(l, *i) 


log — . Hence 
Oo 

(A:115) 


£0 = Max E e (z 


r z 


f(x, 0 o ) 

Ox 


/(l, Oo) 


r) = log 


Oo 


1 0! 


f(x, 6i) 

Since log < 0 implies that x = 0, we have 

/(*,« o)“ 

(A:116) £'0 = Min # 0(3 + r I 2 + r ^ 0) = log 

l-0o 

Now we shall calculate the values £0 and £'0 when X is normally 
distributed with unit variance. Let 


and 


f(x, Oi) = 


/0> 0) = 


1 


\/27r 


3 0l)2 (i = 0, 1 and 0 X > 0 O ) 




o -H(x-0) 2 


We may assume without loss of generality that 0 O = — A and 0 X = A 
where A > 0, since this can always be achieved by a translation. 
Then 

z = log — = 2 Ax 


(A:117) 

Let $(:r) denote 


f(x, do) 


1 


. — e 2 and let G(x) denote , — 

V2 r \/2ir 


1 r -- 

4= I « 2 d<. 


Let 


t = x — 6. Then 2 : = 2A(< + 0) and 
(A:118) E$(z — viz — r ^ 0) 


= 2A Eq ( t ~b 0 — 


2A 


£ -f- 0 


-- S o) 

2A / 


2A 


/too 

I «- 

Ju, 


<o)4>(<) di = 


2A 


[-«oG«o) + *(*>)] 
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where 

(A:119) *o = ^ “ 6 

In Section A.2.5, equation (A:56), it was proved that [<i>(£ 0 )/G(fo)] 
— to is a monotonically decreasing function of t®. Hence the maxi- 
mum of Eq(z — r | z — r ^ 0) is reached when r = 0, and consequently 

2 a r 

< A;12 °> «• - o&j + 4,< ' #)1 - 2a [’ + 

Now we shall calculate We have 
(A:121) £'g = Min Eg(z + r \ z + r ^ 0) 

r 

= — Max Eg{ — z — r | — z — r ^ 0) 

= — 2A Max Eg \ —x I —x ^ 0 ) 

r \ 2A 2A / 

Let t = — x + 8 and t 0 = (r/ 2A) + 8. Then 


(A:122) Eg -x 


- — ^ 0 ) = Egit - t 0 | t - 
2A / 

= v-h f"« - dt = 

G(tn) Jt o 


<0 < - <0 ^ o ) 


Since this is a monotonically decreasing function of t 0 , we have 

/ r , r \ <f>(0) 

(A:123) Max Egy—x — — | —x — — 0 ) = — 8 


r I 

— — x 


so)- 


From (A:121) and (A:123) we obtain 


(A:124) 



Formulas (A:120) and (A:124) have been derived for the case when 
8 0 = — A, 8 1 = A, and <r = 1. For general values 8 0 , 8 1 , and a, the 
values of & and are given by 


(A:125) 

and 

(A:126) 

where 


i r_ h - 8 ) i 

i. <9, -*.)[.« + ^|,J 

■-K-H 5 ) 
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A.4 DERIVATION OF EXACT FORMULAS FOR THE OC AND ASN FUNC- 
TIONS WHEN z CAN TAKE ONLY A FINITE NUMBER OF INTEGRAL 
MULTIPLES OF A CONSTANT 


In this section we shall derive exact formulas for the OC and ASN 
f(x, 0i) 


functions when z = log 


can take only a finite number of inte- 


9 f(x, e 0 ) 

gral values of a positive constant d. This is a rather general result, 
since any distribution of z can be approximated arbitrarily closely by 
a discrete distribution of the above type if the constant d is chosen 
sufficiently small. 

To obtain the exact OC and ASN functions, we shall first derive 
the exact probability distribution of the cumulative sum Z n — 
Z\ H — • + z n at the termination of the sequential process. In what 
follows in this section the probability of any relation and the expected 
value of any random variable are determined under the assumption 
that 6 is the true value of the parameter. 1 However, to simplify nota- 
tion, we shall not put this in evidence in the formulas, i.e., we shall 
write P instead of Pq and E instead of Eq. Let gi and g 2 be two posi- 
tive integers such that P{z = —g\d) and P(z = g 2 d ) are positive and 
z can take only integral multiples of d which are ^ —gid and ^ g 2 d . 
Denote P(z = id) by h{. Then the moment-generating function of z 
is given by 

0'1 

(A:127) E(e* l ) = ^ h/ id = (say) 


To obtain the roots of the equation <£(£) = 1, we let e td = u and 
solve the equation: 

02 

(A:128) 2 hiiC = 1 

t= -01 


Let g denote gi + ^2 and let the g roots of (A: 128) be u\, •••,%, re- 
spectively. We shall assume that no two roots are equal, i.e., Ui uj 
for i j . Substituting Ui for e td in the fundamental identity (A: 16) 
we obtain 


(A:129) E(ui d ) = 1 (i = 1, • • •, g) 


Let [a] be the smallest integer ^ log A/d, and [b] the largest integer 
^ (log B)/d. Then Z n /d can take only the values 


(A:130) 

([b] — gi + 1), ([b] — gi + 2), • • •, [6], [a], ([a] + 1), • • •, ([a] + 02 — 1) 
1 If there are several unknown parameters, 0 denotes the set of all parameters. 
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Denote the g different values in (A: 130) by c lf • • *, c g , respectively. 
Furthermore, denote P(Z n = c l 4) by fe. Then equations (A:129) can 
be written as 

,0 

(A:131) 2j^ C/ * 1 (*' = 9) 

i-i 

Let A be the determinant value of the matrix ||w/ J '|| (i, j = 1, • • *, g) 
and let Ay be the determinant we obtain from A by substituting 1 for 
the elements in the jth column. If A ^ 0, it follows from (A:131) that 
P(Z n = Cjd ) = is given by 

(A:132) (j = ^ 

Thus, the probability L(6 ) that the process will terminate with Z n 5£ 
log B is given by 

(A:133) Lift) = 7 

where the summation is to be taken over all values j for which dc, 
log B. Equation (A:133) is an exact equation of the OC function. 

From the probability distribution of Z n we can easily derive the ex- 
pected value Ee(n ) of n. In fact, in Section A.3 it has been shown that 


But 

(A:134) 

Hence 

(A:135) 


Ee(n) 


Ee(Zn)_ 

Eo(z) 


Ee(Z n ) =2^ 


CjAjd 

A 


E$(n) = 


1 CjAjd 

Eq{z) pfi A 


is the exact equation of the ASN function. 

The method of obtaining the probabilities £i, •••,£*, as described 
above, requires the computation of the roots of the polynomial equa- 
tion (A:128). This is not necessary, however, if a method given by 
Girshick is used . 2 Girshick proceeds as follows. Multiplying 

( — 1 ) by u 01 and - 1 ) by w gi ” l61_1 , we obtain two 

polynomials f(u) and F(u), where f(u) is of degree g x + g 2 = g and 


* M. A. Girshick, “Contributions to the Theory of Sequential Analysis,” The 
Annals of Mathematical Statistics f Vol. 17 (1946). 
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F(u ) of degree g + [a] — [b] — 2. According to (A:128) and (A:131), 
every root of f(u) is also a root of F(u ). Hence 

F{u) = f(u)f*(u) 

where f*(u) is a polynomial of degree [a] — [b] — 2, i.e., 
f*(u) = &o + + • • • 4* k\a]-[b\- 2 U^ ^ 2 

Putting the coefficient of any power of u in F(u) equal to the coef- 
ficient of the same power of u in f(u)f*(u), we obtain a system of 
g + [a] — [b] — 1 linear equations in the g + [a] — [b] — 1 unknowns 
£i> • • *> k 0 , ki, • • *, fc[ a ]_[ 6 ]_ 2 , from which these unknowns can be 

determined. Thus, the probabilities £i, •••, can be determined 
without solving the polynomial equation (A: 128). This advantage is, 
however, bought for the price of an increased number of linear equa- 
tions to be solved. If the roots of the polynomial equation (A.T28) 
are computed, only g linear equations have to be solved for determin- 
ing £i, • • •, £ g . If Girshick’s method is used, no polynomial equation 
is to be solved, but the number of linear equations is increased to 
g+[a] - [6] - 1. 

If g 2 = 1, the OC function L(6) is a simple expression of the roots 
Uiy • • •, u g . In fact, L{6) = P(Z n ^ log B) = 1 — P{Z n ^ log A) = 
1 — f g . We have 




■ 


A = 

u™-°' +1 ■ 

■ W* 1 ' 

u.» 


u ^~°' +1 • 

•• << 

1 

a. = 

„ [61-171 + 1 . 

Ug 


1 


The value of the ratio A g /A is not changed if we multiply the ith 
row of A, as well as that of A g) by uf l ~ [b] ~ l . Thus 


1 

Ui • • 

• u ^'- 1 

Mi " 1 ~ 1-161 

1 

Ug 

■ u /'- 1 


1 

Ui • • 

• Ml " 1-1 

u ^Oi-l-b[o]-[b] 

/ 

1 

Ug • • 

. ?/ 0l ~ 1 
U g 

» 1 - 1 + W -[41 

Ug 
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The cofactor of each element in the last column is a Vandermonde 
determinant. Expanding the determinants in the numerator and de- 
nominator according to their last columns and dividing numerator and 
denominator by the Vandermonde determinant, 


we obtain 


Ui 

Ui 2 • 

■ • «I W 

Ug 

Ug • 

* • Ug 01 


(gi = 9 - 1) 


0 

E 

t = 1 

nr 

-1 -lb] 

S u i - - u i> j 

j 7*1 

0 

E 

i=l 

ffi-l + [a]-[6] 

Hi 

( Ui - l)L£( M i - U l\ 


J7*X 


We shall illustrate the derivation of the exact OC and ASN func- 
tions by a simple example. Let x be a random variable which can 
take only the values 0 and 1. Denote by //; O' = 0, 1) the hypothesis 
that the probability that x = 1 is equal to pi ( i = 0, 1). Let 

1 — e~ 2 e — e ~ 1 

Po = ri ■ and Py = =i 

e — e e — e 

Consider the sequential test for testing H 0 against H\, We shall com- 
pute the probability that the process will terminate with the accept- 
ance of H 0 , and the expected number of trials required by the test, 
when the true probability that x = 1 is equal to p = Li what 
follows in this section, all probability statements and expected values 
refer to the case when p = 

First we compute 0(0 = E(e zt ). Since z can take only the values 

log — = log e = 1 and log — = log e~ 2 = —2 . 

Po 1 - Po 

with probabilities ^7 and respectively, we have 

0(0 = e 1 + Y e 2t 


Letting e* = u and solving the equation 
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we obtain the roots u x = 1, u 2 = 2 , and M3 = The integers 

Ci, c 2 , c 3 are given by 


ci = log B — 1, c 2 = log B, c 3 = log A 


Hence 


1 

1 

1 

2log B— 1 

2log B 

2log A 

(_2)l° g B- 1 

(_|). og B 

(- 

2\log A 
' 3/ 

1 1 

1 



2 1o 8 b 

2 lo g A 



1 (-!) ,ogB 

(-§) ,0SA 


, 

1 

1 1 



2log B — 1 

1 2^°k A. 



(_|)l°gB- 1 

1 (~i) ,ogA 


1 

1 

1 


2log B— 1 

2 l °S B 

1 


(_|) ,og B-l 

(_2) log B 

1 



A = 


Ai 


A 2 = 


A3 


Then the probability that H 0 will be accepted is given by 

Ai + A 2 

Lj 

A 

The expected value of n is given by 

1 C1A1 + c 2 A 2 + c 3 A 3 


E(n) = 


E(z) A 

7 -(- logfl + 1 )A X + (log j?)A 2 + (log. 4 )A 3 
~ 5 A 

7 (— log B + l)A t + (~ log B) A 2 - (log -d)A 3 
5 A 


A.5 THE CHARACTERISTIC FUNCTION AND HIGHER MOMENTS OF n 

A.5.1 Derivation of Approximate Formulas Neglecting the Excess of 
the Cumulative Sum over the Boundaries 

Let 7 j n be a random variable defined as follows: 7 n = log A if 

Z n = zi H 1 - 2„ § log A, and Z n = log B if Z n ^ log B. Denote 

the difference 2 n — Z n by «. Then e is a random variable. 
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In what follows in this section we shall neglect €, i.e., we shall sub- 
stitute 0 for €. No error is committed by doing so in the special case 
when z can take only two values, d and — d, and the ratios (log A)/d 
and (log B)/d are integers, since in this case e is exactly 0. Apart 
from this special case e will not be identical with the constant 0. 
However, the smaller | E(z) | and E(z 2 ), the smaller the error we com- 
mit by neglecting €. In fact, for arbitrarily small positive numbers 
8i and 8 2 the inequality P(| e | g 8{) ^ 1 — 8 2 will hold if | E(z) | 
and E(z 2 ) are sufficiently small. Thus, in the limiting case when E(z) 
and E(z 2 ) approach 0, the random variable e reduces to the constant 0. 

As in the preceding section, all probability statements and expected 
values will refer to the case in which 6 is the true parameter point, 
without putting this in evidence in the formulas by using 0 as a sub- 
script to the operators P and E. Let <t>(t) be the moment generating 
function of z ) i.e., 

= E(e zt ) 

To derive an approximation to the characteristic function of n, we 
shall consider the equation 

(A: 136) — log 4>(t) = r 

where r is a purely imaginary quantity. It will be assumed that z 
satisfies the conditions of lemma A.l. Then, according to lemma A.l, 
the equation — log <t>(t) = 0 has exactly two real roots in t; they are 
t = 0 and t = h (h ^ 0). Furthermore <£'(0) and 4>'(h) both are un- 
equal to 0. Hence, if 4>(t) is not singular at t = 0 and t = h, equation 
(A:136) has two roots, ti(r) and t 2 (r), for sufficiently small values of 
I r I such that lim ti(r) = 0 and lim t 2 (r) = h. Identity (A:16) can 

T~0 T “ o 

be written as 

(A: 137) LE*{e z " , {<t>(t)]- n \ + (1 - L)E**{e z ' l {4>(t)]~ n \ = 1 

where L denotes the probability that the test procedure leads to the 
acceptance of H 0 , E* stands for conditional expected value under the 
restriction that the process leads to the acceptance of Ho, E** stands 
for conditional expected value under the restriction that the process 
leads to the rejection of H 0 . Neglecting the excess of Z n over the 
boundaries, we have Z n = log B when the process leads to the accept- 
ance of j? 0 , and Z n = log A when the process leads to the rejection 
of Ho . Hence (A:137) can be written as 

(A: 138) LB‘P*[<K0]~ n + (1 - L)A*P**[<K$]~ n = 1 
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This identity is valid for all values of t for which | | ^ l. 1 Letting 

t = ti(r) and t = t 2 (T)j we obtain, from (A:138), 

(A:139) LB tM E*(e Tn ) + (1 - L)A tM E **( O = 1 

and 

(A: 140) LB t2(T) E*( e rn ) + (1 - L)A t2(r) E**( e rn ) = 1 

Solving these equations in E*(e Tn ) and 2£**(e Tn ), we obtain 

^<2(r) 

(A:141) E*(e Tn ) = 

and 

j^l(r) ^<2(0 

(A: 142) E**(e Tn ) = ^ _ L)[H i(t) A 12(t) — A‘ lW B <:!(r) ] 
for all imaginary values r. 

The unconditional expected value E(e ™ ) is clearly equal to 
(A:143) E(e Tn ) = LE*(e Tn ) + (1 - L)E**(e Tn ) 

Hence, the characteristic function of n is given by 

A* 2(r) — + B tl (r) — B l 2(t) 

(A:144) ^(r) = 2?(e T ") = 

(for all imaginary t). 

By definition, the expected value E{e Tn ) is the characteristic func- 
tion of n, and (A:144) gives the desired approximation formula when 
the excess of Z n over the boundaries can be neglected. Our deriva- 
tions yield also approximation formulas for 0*(r) = E*(e Tn ) and 
0**( 7 ) = E**(e rn ). The function 0*(r) can be interpreted as the char- 
acteristic function of the conditional distribution of n when the process 
leads to the acceptance of H 0 , and 0**(r) can be interpreted as the 
characteristic function of the distribution of n in the subpopulation 
of samples leading to the rejection of Ho. 

As an illustration we shall determine 0 *(r ), 0 **(t), and 0(r) when 
z has a normal distribution. Denote by n the mean of z and by <r the 
standard deviation of z. Then equation (A:136) can be written as 

ff 2 

- log 0(0 = -fit - —t 2 = T 

1 This follows from the considerations in Section A. 2. 2, since D* is the whole 

complex plane in our case. 
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Hence 

— fi zb a/jui 2 — 2(7 2 t 

(A:146) « = ^ 

Thus 

(A:146) <x(r) = - + -5 V/x 2 - 2<x 2 r 

o' a 

and 

(A:147) fe(r) = “ 4 “ -5 vV - 2a 2 r 

O' O' 


where the sign of \/ is determined so that the real part of 
V/x 2 — 2cr 2 r is positive. Substituting these values for (r) and f 2 ( T ) 
in (A:141), (A:142), and (A:144), we obtain and i(r) in 

the case when 2 is normally distributed. According to formula (3:43), 
an approximation to L is given by 


(A:148) 


L 


A h - 1 
T h - B 1 


When 2 is normally distributed wc have 

-2/t 

(A:149) h = - r 

It is of interest to consider the following two limiting cases: (1) 
B = 0 and A is a finite positive value; (2) B is a finite positive value 
and A = + °°. It can be shown that E(n) will be finite in case (1) 
only if E(z) > 0. Similarly, E(n) will be finite in case (2) only if 
E(z) < 0. Thus, in case (1) we shall assume that E(z) > 0, and in 
case (2) we shall assume that E(z) < 0. To obtain the characteristic 
function \p(j) of a in case (1), we have to determine the limiting value 
of the right-hand member of (A:144) when B — > 0. For this purpose 
we shall first derive the limiting value of = B' 2< ' T> ~ (l(r) when 

B — * 0. Since in case (I) E(z) is assumed to be > 0, the quantity 
h = lim t 2 (r) must be negative, as has been shown in Section A.2.I. 

T-0 

Hence, for small r the real part of < 2 (r) is negative. On the other hand, 
the real part of «i(r) approaches 0 as r— >0. Thus, for small r the 
real part of fe(r) — fi(r) is negative, and, therefore, 

(A:150) lim | £' 2(t)-(iW | = + » 

B = 0 

From (A:150) and from the relation lim | B hM | = », it follows that 

B — 0 

with B -+0 the right-hand member of (A:144) converges to 
(A:151) A~ tlM 
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Thus, if E(z) > 0 the characteristic function of n in case (1) is given 
by (A:151). When z is normally distributed, ti(r) is given by (A:146). 
Hence, for normally distributed z with /x > 0 the characteristic func- 
tion of n in case (1) is given by 


(A:152) 




-2<r 2 r 


In case (2) we have assumed that E{z) < 0. Hence t 2 {r) and 
t 2 ( t) — ti(r) will have a positive real part for small r. Thus, 

(A:153) lira | A hM | = lira | A‘ 2(r)_ ‘ ,(r) | = +<*> 


From (A:153) it follows that the limiting value of the right-hand mem- 
ber of (A:144) when A — > oo is given by 

(A:154) B~ tl(T) 

Thus, if E(z) < 0, the characteristic function of n in case (2) is 
given by (A:154). 

The moments of n can be obtained by differentiating the character- 
istic function of n. For any positive integer r the rth moment of n is 
given by 

(A:I55) E(n r ) = ^*(r). 

dr 

We can also obtain the conditional moments of n in the subpopula- 
tion of samples for which Z n ^ log B, as well as in the subpopulation 
of samples for which Z n ^ log A. Let E*(n r ) denote the conditional 
expected value of n r in the subpopulation Z n g log B , and let E**(n r ) 
denote the expected value of n r in the subpopulation Z n ^ log A. 
Then we have 

d r d r 

E*(n r ) = — i P*(t) and E**(n r ) = — ^**(r) 
dr r dr T 

where ^*(r) and are the conditional characteristic functions 

given in (A:141) and (A:142). 

d r d r 

It may be of interest to note that — ^*(r), — and, there- 

dr r dr r 

d r 

fore, also E(n r ) = — ^(r) can be obtained from identity (A:138) di- 
dr r 

rectly by successive differentiation. In fact, (A:138) can be written as 
(A:156) LB l r[- log *(«)] + (1 - L)Aty**[~ log <K0] = 1 
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Taking the first r derivatives of (A:156) with respect to t at t — 0 
and t = h, we obtain a system of 2 r linear equations in the 2r un- 

■tj jj 

knowns ~-^*(r) and ^**( T ) 0 “ * **> r ) from which 


these unknowns can be determined. For example 


#*(r) 


d 'l / **{ T \ can ^ determined as follows. Taking the first deriva- 
tor T =o 

tive of (A:156) with respect to t we obtain 

, <t>'(t) dt*(r) 

(A:157) L(log - LB ‘ — + 

4>'(t ) drf/**(r) 

(1 - L)(log A)A l r*{r) - (1 - L)A ( — - — = 0 

[r = - log <#>(*)] 

Letting £ = 0 and t = h we obtain the equations 
. .. „ , *'(0) d**(r) I , 


(A: 158) L log B — L 


(j>( 0) dr |t=0 

(1 -L)\ogA - (1 -L) 


»'( 0 ) 

<£(0) dr t- 0 


. *'(A) 

(A:159) Ldog^-L^ — — r=Q 


+ (1 - L) (log A)A* - 


(1 - L)A 


k 4>'{h) &•*(?) 
<£(/&) dr 


#*(r) 
from which 




dr |r«0 


dr |r=o 


can be determined. 


A.5.2 Derivation of Exact Formulas When z Can Take Only a Finite 
Number of Integral Multiples of a Constant 

We shall use here the notation defined in Section A.4 without any 
further explanation. Let ^<(r) denote the characteristic function of 
the conditional distribution of n in the subpopulation of samples for 
which Z n = Cid (i = 1, • • •, g). The equation in t 

(A:160) = e- r 

has g roots £i(r), • • •, t g {r) such that 

(A:161) lim e ti(r)d = Ui (i = 1, • • •, g) 

T = 0 
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The fundamental identity (A:16) can be written as 
0 

(A:162) log m = 1 

Substituting 2*(r) for t in (A:162), we obtain 



(A:163) 

r) = 1 

i— l 

(i = 1, • • 

These equations are linear in the unknowns ^i(r), 
determinant of these equations is given by 

(A:164) 

S(t) = 

t l e e,,,(r)d ■ 
he cMr)d ■ 

{ e Cgtl ^ d 

£ g e Cgt2 ^ d 



h e cMr)d • 

% g e Cgtg ^ d 


Obviously, 5(0) = £i£ 2 * * * £ g A. Hence, if & ^ 0 (i = 1, • • *, g) and 
A^O, then 5(0) ^ 0, and consequently 5(r) ^ 0 for any r with suffi- 
ciently small absolute value. Thus, ^i(r), • • *, \ p g (r) can be obtained 
by solving the linear equations (A :16s). 1 The characteristic function 
yp(j) of the unconditional distribution of n is given by 

(A: 165) *(r) =22Mt) 

i=l 

For any positive integer r, the exact rth moment of n, i.e., E(n r ), is 
given by the rth derivative of \p(r) with respect to r at r = 0. 


A.6 APPROXIMATE DISTRIBUTION OF n WHEN z IS NORMALLY DIS- 
TRIBUTED 


A.6.1 The Case When B = 0 and A Is Finite 


In this case we have assumed that E(z) = ju > 0. Then the ap- 
proximate characteristic function of n, if the excess of Z n over the 
boundaries is neglected, is given by (A: 152). Let 


(A:166) 


m = 


2 o 2 


n 


1 This method of determining ^i(r), • • *, yp g (r) requires the computation of 
the roots of equation (A:160). This can be avoided, as Girshick has shown in 
his paper mentioned in Section A. 4, if a device is used similar to that applied by 
him for determining £i, •••,£* (see Section A.4). 
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Then the characteristic function of m is given by 

(A:167) = e c[1 - VrrTl 

where 

an 

(A:168) c = — > 0 


aa 

c = - 2 >0 

a 1 


(A:169) a = log A 

The sign of the square root in (A: 167) is determined so that the real 
part of Vl — f is positive. The distribution of m is given by 




(A:170) f e c(1 - Vl -°- 

2m J-i » 

Let 

(A:171) G{c, m) = — f e~ e 

Zirl J —i °° 

and 

(A:172) = 


(A:171) 


e _ c Vi 


-c'/l-t- 


(A:173) g -e^l-t-mt ==——( — 

2m dt 2m \2\ 

we have 

c 1 

(A:174) - H(c, m) — mG(c, m) = — 

2 2 TTl 

From (A:171) and (A:172) we obtain 
. . _ . dH ( c > m ) , 


= — ( ° : - m ) 

2« \2 VI - < / 

\ 1 I" -cvrr,_ m( 1' _ « 


-cVi-i-j 


(A:175) 


+ C?(c, m) = 0 


From (A:174) and (A:175) it follows that 


c 3/7 (c, m) 

- Hie, m) + m = 0 

2 3c 


c 

log H(c, m) = b log \{m) 

4m 


(A:176) 

Hence 

(A:177) 


where X(m) is some function of m only. Thus 


(A:178) 


H(c, m) = \(m)e 4m 
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Now we shall determine \(m). We have 

(A:179) X(m) = 7/(0, m) = — . f —7!= e - ™* dt 

2iri J- V 1 — £ 

Since (1 — is the characteristic function of Jx 2 where x 2 has the 
X 2 -distribution with one degree of freedom, the right-hand side of 
(A:l 79) is equal to 

1 


r(£)vw 


Hence 

(A:180) 


X(m) = 


1 


r (l)Vm 


From (A:178) and (A:179) we obtain 

1 


(A:181) 


II (c, m) = 


r(|)Vm 


From (A: 174) and (A:181) we obtain 

c 


(A:182) 


G(c, m) = 


2Y(\)m h 

Hence the distribution of m is given by 


(A:183) F(m) dm = 


2T(%)m / 


— m+c 

, 4 m 


dm 


(0 ^ m < 00) 


Let m = (c/2)ra*. Then the distribution of m* is given by 


(A:184) D(m*) dm * = 


cf 

2 




2r 


G) G)* (m * } * 

Vc -I(^ +OT *- 2 ) 


dm* 


V27r (m*) M 


dm* 


The function (1/m*) + m* — 2 is non-negative and is equal to 0 only 
when m* = 1. If c is large, then 7)(m*) is exceedingly small for values 
of m* not close to 1. Expanding (1/m*) + m* — 2 in a Taylor series 
around m* = 1, we obtain 


(A:185) 


— + m* — 2 = (m* — l) 2 + higher order terms 
m* 
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Hence for large c 

\/r — (-} (m* — l ) 2 

(A:186) Dim*) dm* ~ e dm* 

\ 2ir 

i.e., if c is large m* is nearly normally distributed with mean equal to 
1 and standard deviation 1 /y/c. 


A.6.2 The Case When B > 0 and A = oo 

In this case we have assumed that E(z) = /x < 0. It can easily be 
shown that the distribution of m = (/z 2 /2<r 2 )n is now given by the ex- 
pression we obtain from (A:183) if we substitute, (m/^ 2 ) log B for c. 

A.6.3 The Case When B > 0 and A Is Finite 

In this case the approximate characteristic function of n } if the 
excess of Z n over the boundaries is neglected, is given by (A:144) 
where hir) and t 2 {r) are equal to the right-hand members of (A:146) 
and (A:147), respectively. Let 

M 2 M 

m = — - n and d - 

2a 2 (T 

Then the characteristic function of m is given by 

_|_ gh2 jfihi 

(A. 187) $00 A hl B h2 A h2 B hl 

where 

(A:188) h = d(l - Vl - t ), h 2 = d(l + Vl-t ) 

and t is an imaginary variable. Letting A d = A, B d = B, da = d, 
and db = 6, the characteristic function of m can be written as 

( A:189 ) _ 

A(e- sVrr ‘ - e° VXZrt )_ + - e^^ 1 ) 

I(e~ W ~ l - - e^- 26 )^) 

IS( 1 - e 2(5 ~ S)Vi:r ') 


It will be sufficient to consider only the case when n > 0, since the 
case when n < 0 can be treated in a similar way. Then a < 0 and 
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b > 0. Since the real part of + VT"— 7 is greater than or equal to 1 
we have 


(A: 190) | e 2(5-wvT^| 

for any imaginary value of t. Let 


< 1 


(A:191) 

Then 

(A:192) 


2(a-6)vT=< 


i — 

— = 'T) t3 

i _ T L—d 

A 1 o 


From (A:189) and (A:192) it follows that $(t) can be written in th( 
form of an infinite series: 


(A:193) 


Kt) = 




i = l 


where X* and are constants and X t - > 0. Each term of this series k 
a characteristic function of the form given in (A:167) except for i 
proportionality factor. Let Fi(m) be the distribution of m correspond- 
ing to the characteristic function Then F*(ra) can be ob 

tained from (A:183) by substituting X t - for c. Since we may integrate 
the right-hand member of (A:193) term by term, the distribution o 
m is given by 


(A:194) 


F(m) dm = 


frf 


dm 


A.6.4 Some Remarks 


Since m is a discrete variable, it may seem paradoxical that we 
obtained a probability density function for m. However, the explana- 
tion lies in the fact that we neglected e = 2 n — Z n and this quantit} 
is 0 only in the limiting case when n and cr approach 0. 

If | M | and a are sufficiently small as compared with log A anc 
| log B |, the distribution of m given in (A: 194) will be a good approxi- 
mation to the exact distribution of m, even if z is not normally dis- 
tributed. The reason for this can be indicated as follows. Let 

x r 

(A:195) Ajx z* = z, (i = 1, 2, • • •, ad inf.; 

'C/y ; = «- l)r+l 

where r is a given positive integer. Since the variates Zj are inde- 
pendently distributed, each having the same distribution, under some 
weak conditions the variates zf (i = 1, 2, • • •, ad inf.) will be nearly 
normally distributed for large r. Hence, considering the cumulative 
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sums Zf = Zi* + z 2 * d b zf (i = 1 , 2 , • • •, ad inf.), the distribu- 

tion given in (A:194) is applicable with good approximation, provided 
that r | ju | and are small compared with log A and | log B | so 
that the difference c* = E n * — Z n * can be neglected. 

It would be desirable to derive limits for the error in the cumulative 
distribution of m caused by neglecting Z n — Z n , No such limits have 
yet been obtained. 


A.7 EFFICIENCY OF THE SEQUENTIAL PROBABILITY RATIO TEST 


Let S be any sequential test for testing H 0 against Hi such that 
the probability of an error of the first kind is a , and the probability 
of an error of the second kind is 0, and the probability that the 
test procedure will eventually terminate is 1. Let S' be the se- 
quential probability ratio test whose strength is equal to that of S. 
We shall prove that the sequential probability ratio test is an optimum 
test, i.e., that Ei(n | S ) ^ Ei(n | S') (i = 0, 1), if for S' the excess of 
Z n over log A and log B can be neglected. 1 This excess is exactly 0 
if z can take only the values d and —d and if log A and log B are 
integral multiples of d. In any other case the excess will not be iden- 
tically 0. However, if | E(z) | and the standard deviation o z of z are 
sufficiently small, the excess of Z n over log A and log B is negligible. 

For any random variable u, we shall denote by Ef(u | S) the con- 
ditional expected value of u under the hypothesis Hi (i = 0, 1) and 
under the restriction that H 0 is accepted. Similarly, let Ei**(u \ S) 
be the conditional expected value of u under the hypothesis Hi 
(i = o, 1) and under the restriction that Hi is accepted. In the nota- 
tions for these expected values, the symbol S stands for the sequential 
test used. Let Qi(S) denote the totality of all samples for which the 
test S leads to the acceptance of Hi. Then we have 


(A:196) 

(A:197) 

(A: 198) 
and 

(A:199) 



pdQom p 


Po[Qo(S)] 1 

— a 

PitQiCS)] . i 


PoKhGS)] 

a 

PotQoOS)] l 

— a 

PilQo(S)] 

p 

PoKhOS)] _ 

a 


pdQim i -/* 


1 denotes the expected value of n when Hi is true ( 9 ~ &i) and the sequen- 


tial test S is used. 
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To prove the optimum property of the sequential probability ratio 
test, we shall first derive two lemmas. 

Lemma A. 2. For any random variable u the inequality 
(A:200) e E{u) ^ E(e u ) 

holds . 

Proof. Inequality (A:200) can be written as 
(A:201) 1 ^ E(e u ) 

where u' = u — E(u). Lemma A. 2 is proved if we show that (A:201) 
holds for any random variable u' with zero mean. Expanding e u> in a 
Taylor series around u' = 0, we obtain 

(A:202) e u ' = 1 + u' + \u' 2 e iM 

where £(w') lies between 0 and u f . Hence 

(A:203) E(e u ) = 1 + ^ 1 

and lemma A.2 is proved. 

Lemma A. 3. Let S be a sequential test such that there exists a finite 
integer N with the property that the number n of observations required for 
the test is ^ N. Then 2 

E { (log — | s') 

. \ VO rt / 

(A:204) Ei(n | S) = -j- (i = 0, 1) 

Ei\z) 

The proof is omitted, since it is essentially the same as that of 
equation (A:69) for the sequential probability ratio test. 

On the basis of lemmas A.2 and A.3 we shall be able to derive the 
following theorem. 

Theorem: Let S be any sequential test for which the probability of an 
error of the first kind is a, the probability of an error of the second kind 
is /3, and the probability that the test procedure will eventually terminate 
is equal to 1. Then 

, If p 1 - P~] 

(A:205) E 0 (n S) £ — — (1 - a) log- + alog 

E 0 (z) L l — a a J 

and : 

(A:206) ^i(n|5) ^-L-Llog-^— +(1 -|8)log— ^1 
Ei(z) L I — a a J 

2 The validity of (A:204) has been established under very general conditions 
even when the probability that n > N is positive for any N. See the author’s 
article, “Some Generalizations of the Theory of Cumulative Sums,” The Annals of 
Mathematical Statistics, Vol. 16 (1945), and D. Blackwell, “On an Equation of 
Wald,” The Annals of Mathematical Statistics , Vol. 17 (1946). 
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Proof. First we shall prove the theorem in the case when there 
exists a finite integer N such that n never exceeds N. According to 
lemma A.3 we have 


(A:207) E 0 (n \ S ) = E 0 (log — | s) 

E 0 (Z ) V Pon / 

= [ (1 - a)E 0 * (log — | s) + a E 0 ** (log— | s) \ 

E 0 (z) L \ POn / \ POn /-I 

and 

(A:208) Ey(n | S) = -J— Ey (log — | .S') 

El (Z) \ POn / 

= - 1 - (log — I s) + (1 - mi** (log — I s) ] 

Ei(z) L \ Von / \ Von /j 


From equations (A:196) through (A:199) and lemma A.2 we obtain 
the inequalities 

(A :209) E 0 * ( log — | S) g log — — 

\ Pon / l - a 

(A:210) E 0 ** (log — | s) £ log — - 

\ Pon / a 

(A:211) J?i*(log— |s) = —Ei* (log — | s) ^ log^-^ 

\ P In / \ Pon / 0 

and 

(A:212) Ey** (log — | s) = —Ey** (log — | «) g log — — 

V Pin ' V Pon * / 1 ^ 

Since i?o(z) < 0, (A:205) follows from (A:207), (A:209), and (A:210). 
Similarly, since Ei(z ) > 0, (A:206) follows from (A:208), (A:211), and 
(A:212). This proves the theorem when a finite integer N exists such 
that n g N. 

To prove the theorem for any sequential test S of strength* (a, 0), 
let Sn be the sequential test we obtain by truncating 8 at the Nth 
observation if no decision is reached before the Nth observation. Let 
(<xn, Pn) be the strength of Sn- Then we have 

(A:213) E 0 (n \ S) ^ E 0 (n \ S N ) 

i r pn i — pn 

^ — - (1 - «*) log-^ + w log 

Eq(z) L 1 — ajvr otN 
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and 

(A 214) Ei(n | S) ^ Ei(n | Sn) 


1 

~EA?) 


Pn log 


Pn 

1 &N 


+ (1 — Pn) log 



Since limav = a and lim Pm = P, inequalities (A:205) and (A:206) 

N =» oo N = oo 

follow from (A:213) and (A:214). Hence the proof of the theorem is 
completed. 

If for the sequential probability ratio test S' the excess of the cumu- 
lative sum Z n over the boundaries log A and log B is 0, E 0 (n S') is 
exactly equal to the right-hand member of (A:205) and E x (n S') is 
exactly equal to the right-hand member of (A:206). Hence, in this 
case, S' is exactly an optimum test. If both | E(z) | and cr z are small, 
the expected value of the excess over the boundaries will also be 
small and, therefore, E 0 (n | S') and E x (n | S') will be only slightly 
larger than the right-hand members of (A:205) and (A:206), respec- 
tively. Thus, in such a case, the sequential probability ratio test is, 
if not exactly, very nearly an optimum test. 3 

If 6\ approaches 0 O , then the ratios of the upper limits of E 0 (n \ S') 
and Z?i(n|/S'), as implied by (A:77) and (A:78), to the right-hand 
members of (A:205) and (A:206), respectively, converge to 1. Thus, 
the efficiency of the sequential probability ratio test, if not exactly 1, 
converges to 1 when 6 X — > 0 O - 4 The upper bounds for E 0 (n | S') and 
E\(n | S') given in (A:77) and (A:78) determine lower bounds for the 
efficiency of the sequential probability ratio test S'. 


A.8 DETERMINATION OF AN OPTIMUM WEIGHT FUNCTION u;(0) IN 
SOME SPECIAL CASES OF TESTING SIMPLE HYPOTHESES WITH 
NO RESTRICTIONS ON THE POSSIBLE ALTERNATIVE VALUES OF 

THE PARAMETERS 

A.8.1 A Class of Cases for Which an Optimum Weight Function u;(0) 
Can Be Determined by a Simple Procedure 

Let (0i, • • •, 6k) = (0i°, * • •, 0fc°) be the simple hypothesis H 0 to be 
tested and denote the distribution of x by f(x , 0i, • • •, 6k). Assume 
the boundary of the zone av of preference for rejection is a surface in 
the parameter space and denote it by S r . Assume, further, that it is 

a The author conjectures that the sequential probability ratio test is exactly an 
optimum test even if the excess of Z n over the boundaries is not 0. However, he 
did not succeed in proving this. 

4 For the definition of the efficiency of a sequential test see Section 2.4.1. 
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possible to find a non-negative function v(6) of the parameter 0 such 
that the surface integral 1 


(A:215) 


v(d) dS = 1 


and the sequential probability ratio test based on the ratio 


(A 216) 


f f(x u e u . • e k ) . • • f(x n , e u • • - 9 o k )v(o) dS 

In ^ 

^ = /(* i, »i°, • • • , fc°) • • • /(*», «i°, ■ • • , «*°) 


satisfies the following two conditions (for any values A and B ): (1) 
The probability /9(0) of committing an error of the second kind (of 
accepting H 0 when 0 is true) is constant over the surface S r ; (2) for 
any point 0 in the interior of <o r) the value of (3(0) does not exceed the 
constant value of (3(0) on the surface S r . 

We shall now show that v(0) may be regarded as an optimum weight 
function in the sense defined in Section 4.1.3, and the probability ratio 
test based on the ratio (A:216) provides a solution to our problem. 
In fact, the weight function v(9) over the surface S r can be considered 
a limiting case of a weight function w(0) which takes the value 0 for 
any 0 in the interior of w r whose distance from the boundary exceeds 
some positive A, with A approaching 0 in the limit. It follows from 
conditions (1) and (2) that for the weight function v{6 ) the maximum 
of /3(0) in o r is equal to the weighted integral of /3(0), i.e., to 

I fj(d)v(0) dS. Consider now any other weight function w*(6) and 

J Sr 

denote the resulting probability of an error of the second kind by 
/3*(0) when w*(6) is used instead of v{6). It has been shown in Section 
4.1.3 that the following relations hold with sufficient approximation 
for practical purposes : 

r r B ( A - 1 ) 

(A:217) J w*(9)p*(e ) d8 = J g t>(0)(3(0) dS = — — — 


Hence the maximum of (3*(0) in co r is 5: B(A — l)/'(.4 — B). The 
optimum property of the weight function r(0) follows then from the 
fact that the maximum of v(6) is equal to B(A — 1)/ (A — B). 

In several important statistical problems one can easily find a weight 
function v{6) such that conditions (1) and (2) are fulfilled. We shall 
show, for example, that such a weight function v{6) can easily be de- 
termined for testing the means of normally distributed variables with 


1 dS denotes the infinitesimal surface element. 
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known variances. After the weight function v(d) has been found, for 
practical purposes we may let A = (1 — 0)/a and B — 0/(1 — a) 
where a is the required value of the probability of an error of the first 
kind and 0 is the required upper limit for 0(0). 

Although we have so far assumed that X is a single random vari- 
able, all the results remain obviously valid when X is a random vector, 
i.e., X represents a set of p (p > 1) random variables X lf • • •, X p . 
The only change in the formulas is that the ath observation x a will 
have to be replaced by a set ( x\ a , •••, x pa ) of p values where X{ a repre- 
sents the ath observation on X*. 


A.8.2 Application to Testing the Means of Independently and Nor- 
mally Distributed Random Variables with Known Variances 

Let Xi, • • •, Xk be k normally and independently distributed ran- 
dom variables with a common known variance a 2 . The mean values 
0i, • * *, 0fc are assumed to be unknown. Suppose that it is required 
to test the hypothesis that (0j, • • •, 0*) = (0i°, • • •, 0*°). Assume that 
the zone w r of preference for rejection is given by 


+ V(0! - O 2 +...+ (**- e k 0 ) 2 £ 5<r 


where 8 is some given positive value. Then the boundary S r of co r is 
a sphere with center 0° = (0i°, and radius 8a. Let v{6) be 

constant over S r and equal to the reciprocal of the area of S r . We 
shall show that for this weight function conditions (1) and (2) of the 
preceding section are fulfilled. For this purpose, we shall first prove 
that the ratio (A:216) is a monotonically increasing function of 

{x\ — 0i°) 2 H b ( Xk — 6k 0 ) 2 where X{ is the arithmetic mean of the 

observations on X*. In fact, in our case the ratio (A:216) reduces to 


" 2^2 2 2 

c I e <-i«=i dS 


(A:218) 


2 <r 2 


2 S(x ia -0 v 0 ) 2 


= ce~' /inS * 


Sr 


n 2 J ~2 

e < dS 


where c is equal to the reciprocal of the area of S r . Let r x denote 

(%i — 0i°) 2 

\ / j — ” — 2 ~ — and ^ pW (0 ^ p 5= 7 r) denote the angle be- 


tween the vector ( xi — 0i°, • • • , Xk — 0*°) and the vector (6i — 0!°, 
• • *, 0jfc — 0*°). Then (A:218) can be written as 


ce 



cos [p(0)J 


(A:219) 
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Because of the symmetry of the sphere, the value of (A:2I9) will not 
be changed if we substitute y(9) for p(0), where y{9) (0^7^ t) de- 
notes the angle between the vector 6—6° and an arbitrarily chosen 
fixed vector u. From this it follows that the value of (A:219) depends 
only on r x . 

Now we shall show that (A:219) is a strictly increasing function of 
r x . For this purpose we merely have to show that 

(A:220) I(r x ) = f e nr * 8 008 17(9)1 dS 

J Sr 


is a strictly increasing function of r x . We have 


(A:221) 


dl(r x ) 

dr x 


= f n8 cos [y(Q)]e nT * 5 cos [7<wl 

J Sr 


dS 


Denote by S' r the subset of S r in which 0 ^ y(8) < x/2, and by S" r 
the subset in which 7r/2 < y{0) ^ w. Because of the symmetry of the 
the sphere we have 


(A:222) 


X 


n 8 cos [y(0)]e nrx 


COS It(«)] 


I n 8 cos [tt 

JS'r 


y{B)\e nTx 8 cos b'” 1 '® 1 dS 


Hence 


(A:223) 


dl(r x ) 

dr x 


S’r 


n8cos[yme~ nrzSmBb(9)] dS 


n 8 f cos [y{d)]{e nSrx coah<0)1 
Js f r 


e -nar x cosfY(0)]^ 


The right-hand side of (A:223) is positive. Hence, we have proved 
that expression (A:219) or (A:218) is a strictly increasing function of r x . 

We shall now show that ft (6) is constant over any sphere. S r (d ) 
with center 6° and radius d and that it decreases monotonically with 
increasing d. For this purpose let y u •••, yk be an orthogonal 
linear transformation of x\ — 0i°, — dk° so that E(yi) = 

V (0i — 0i 0 ) 2 H b (h ~ h Q ) 2 and E{y t ) = 0 (i = 2, • • • , k). Since 

j/i 2 H b Vk = («x - 0i°) 2 H b (xk — 8k) 2 and since (A:219) 

depends only on (xi — 0i 0 ) 2 +• — b fe — 9k 0 ) 2 , it is seen that the 
sequence of expressions (A:219) formed for the sequence of integers 
n = 1, 2, • • etc., has a joint distribution which depends only on 
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V (01 - 01°) 2 H + (6k - ^°) 2 . Hence 0(0) is constant on any 

sphere S r (d). Since (A:219) is a strictly monotonic function of r Xi it 
can be shown that 0(0) is monotonically decreasing with increasing d. 
Hence, conditions (1) and (2) of the preceding section are fulfilled and 
we can test the hypothesis that 0 = 0° by the sequential probability 
ratio test based on the ratio (A:218). 

If k = 1, i.e., if we test the mean value of a single random variable 
X, the sphere S r is a null-dimensional sphere consisting of the two 
points 0i = 8a and 0 2 = — 8a and (A:216) reduces to the ratio of pi n 
to po n given by (4:8) and (4:9), respectively, in Section 4.1.4. 


A.9 DETERMINATION OF OPTIMUM WEIGHT FUNCTIONS w a (t) AND 
u; r (0) IN SOME SPECIAL CASES OF TESTING COMPOSITE HYPOTHESES 

A.9.1 A Class of Cases for Which Optimum Weight Functions w a (Q ) 
and w r (Q) Can Be Determined by a Simple Procedure 

Let f(x, 0i, • • •, 6 k ) denote the distribution of x involving k unknown 
parameters 0i, • • •, 0/c. Suppose we wish to test the composite hypoth- 
esis H u that the parameter point 0 lies in the subset w of the parameter 
space. Let o> a denote the zone of preference for acceptance and co r the 
zone of preference for rejection. Assume that the boundary of co r is 
a surface S r . Suppose that it is possible to find two weight functions 
v a (6 ) and v r (0) such that 



and that the sequential probability ratio test based on the ratio 


(A:224) 


Pin 
Vo n 


f Vr(d)f[f(x a 

JSr 


01 > * * ') @lc) dS r 



• * , Ok) dO 


satisfies the following conditions (for any values A and B ) : (1) a{6) is 
constant in w a ; (2) 0(0) is constant over S r ; (3) for any point 0 in the 
interior of co r , the value of 0(0) does not exceed the constant value of 
0(0) on S r . 

We shall now show that v a (0) and v r (0) may be regarded as optimum 
weight functions in the sense defined in Section 4.2.2. For this pur- 
pose, let w a (0) and w r (0) be any other weight functions and let a*(0) 
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and 1 3* (6) be the resulting probabilities of errors of the first and second 
kinds when w a (9) and w r (6) are used. Since, as has been shown, 

r i - b 

(A:225) <x*(e)w a (0) dS = 

«/a> a A — B 

and 

r B(A - 1) 

I 0*(0)w r (0) dd = — — 

Xr A — B 

hold with good approximation, we see that in co 0 the maximum of 
/**(#) ^ (1 — B)/ (A - B), and in w r the maximum of 0*(0) ^ 
B(A — 1 )/(A — B) with good approximation. But if v a (6) and 
v r (6) are used, it follows from conditions (1), (2), and (3) that (with 
good approximation) the maximum of a(9) in co a is equal to 
(1 — B)/(A — B) and the maximum of 13 ( 6 ) in co r is equal to 
B(A — 1 )/(A — B). Hence these weight functions are optimum in 
the sense defined in Section 4.2.2. 

In some special but important statistical problems one can easily 
find weight functions v a (0) and v r (6) which satisfy conditions (1), (2), 
and (3). It will be seen in the next section that such weight functions 
can easily be constructed when the mean of a normal distribution with 
unknown variance is being tested. Again, for practical purposes we 
may let A = (1 - fi) /a and B = 0/(1 -a), where a is the required 
upper bound of a(6) in w a and 0 is the required upper bound of 0(0) 
in co r . 


A.9.2 Application to Testing the Mean of a Normal Distribution with 
Unknown Variance (Sequential f-Test) 


Let I be a normally distributed random variable with unknown 
mean 6 and unknown variance o 2 . Suppose we wish to test the hy- 
pothesis that 9 = 0 O - Furthermore, assume that co r is given by the set 


of all points (0, <r) for which 



^ 6, while w 0 consists of all 


points (0 O , <?)• 


Then the boundary £ r o: 


co r consists of all points (0, o) 


for which 


0 -0 O 


= 5, i.e., it contains the points (0, o) for which 


i * I 

either 0 = 0 O + So or 0 = 0o — Sc. 

For any positive value c we define the weight functions v ac (o) and 
Vrc(o) as follows: v ac (o) = l/cif0^<r^c and equals 0 for all other 
values of o. The weight function v rc (o) is equal to l/2c if 0 g o g c 
and 0 = 0o ±So and equal to 0 otherwise. Let 
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(A:226) 
Pi 


J r / \ i - si > 

! «rcO) —e 2a da 

' (2 ir)\ n 

_ 1 J_ ( r i_ - 27* S(*a-9o-J») J , 1-272 2(*a-»0+S»)A 

= 2c VJo <7* <T n / 


and 


, , 1 1 C c 1 - 5 ^ 2(x a -«o) J 

(A:227) Pon = 7 7 I "S e 

(2x)2 *'° * 


da 


Then 


(A :228) 


Pi 


POn 


1 f I (,- => + 5> * 

2 Jo <T 


fi -27^- 

Jo a" 6 


d(r 


We consider the limiting case when c — > °o . Thus 


(A:229) 


Pi 


P On 


1 /■« 1 _J_ 

- I -(e 2^ 
„ 2 Jo 


-Aj 2(x a -0 o -«<O a , - 27 2 0ra-»o+M 5 


+ e 


) d<r 


Jo a n 


da 


The sequential probability ratio test based on the ratio (A:229) pro- 
vides a solution to our problem if it can be shown to have the follow- 
ing three properties: (1) a(6, a) is constant in co a ; (2) / 3(6 , a) is a func- 


tion of 


6 - 6 0 


alone; (3) 0(0, a) is mono conically decreasing with 


increasmg 


To prove these three properties, let x denote 


2> 


1 and S 2 denote 


2(x a — x) 2 . Since the joint distribution of a sequence of expressions 
x — 0 O 


5 

0-0 o 


corresponding to consecutive values of n depends only on 

, the first two properties are proved if we show that the ratio 

x — 0, 


(A:229) is a single-valued function of 


S 
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First we show that the numerator of the ratio (A:229) is a homo- 
geneous function of (xi — Oo, x 2 — * ' * > x n — 6 o) of degree 

— (n — 1 ), j n fact, making the transformation <r = \t we obtain 



2*2 


Z(Xx a — X0o— 5<0 2 


+ e 


— S(Xa; a — X^o+i** - ) 


) do* 




- Y t i S(*a-«0-«) 2 


+ e 


- 2^2 S(Xa-«0+«) 


) dXi 


— f" - te" * 2(l ““ 9o_i ' )2 + <f ^ stl “- 9o+i<) \ dt 

X n_1 Jo i" V 


This proves that the numerator of (A:229) is a homogeneous function 
of X\ — 6qj x n - &o of degree -(n - 1). Similarly, it can be 
shown that the denominator of (A:229) is also a homogeneous func- 
tion of degree — (» — 1). Thus, the ratio (A:229) is a homogeneous 
function of zero degree in the variables Xi — do, — do- 

lt can be verified that (A:229) is a function of only the two expres- 
sions 2(x« — do) 2 and 2(x a — d 0 ), i.e., 

(A:230) — = t[2(x a - d 0 ) 2 , 2(x a - 0 o )] 

V On 


Let v = | V2(x a - 0 o f |. Since (A:230) is a homogeneous function 
of zero degree in x x - d 0 , • • • , x n - d 0 , its value is not changed by 
substituting ( x a — d Q )/v for x a — 0 {) . Hence 


(A:231) 


p ln " / x a — d 0 \ 2 S (x a — fl 0 ) ~l _ T ^ n(x — flp) ~j 

Pon * L « V v ' v J L ’ a J 


n(x — 0 O )’ 


Since <f>[ Z(x a - 0 O ) 2 , -2(*« - <? 0 )1 = <f>[2(z« - do) 2 , 2(*« - 0 O )1, 
see that 

Pin _ [ Qg - flp) 1 

Pon L a 2 J 


we 


Since 


(x - 0 O ) 2 


is a single-valued function of 


x — d 0 

S 


, we have proved 


that — is a single-valued function of 

Pon 



Hence properties (1) 


and (2) are proved. 

In order to prove property (3) of the sequential probability ratio 
test based on the ratio (A:229), it is sufficient to show that (A:229) 
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X - 00 0 . 

„ . Smce 

x - do 

S 

S 


is a strictly increasing function of | — | . Since | — | is a 
strictly increasing function of (r^> , we have only to show that 

( x — 0 O V 

(A:229) is a strictly increasing function of l — - — I . The latter 


statement is proved if we show that (A:229) increases with increasing 
value of | x — do | while v is kept fixed. For a fixed value of v the 
denominator of (A:229) is constant. Thus, we merely have to show 
that the numerator of (A:229) increases with increasing | x — 0o | 
while v is kept fixed. This follows easily from the fact that 


(x—do )8 (x — do)8 

e ° + e~ 


is a strictly increasing function of x — 9 0 . 
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